The setting allow_experimental_analyzer is enabled by default and it switches the query analysis to a new implementation, which has better compatibility and feature completeness. The feature "analyzer" is considered beta instead of experimental. You can turn the old behavior by setting the compatibility to 24.2 or disabling the allow_experimental_analyzer setting. Watch the video on YouTube.
ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, output_format_parquet_string_as_string, output_format_orc_string_as_string, output_format_arrow_string_as_string. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster lz4 compression method, that's why we set zstd by default. This is controlled by the settings output_format_parquet_compression_method, output_format_orc_compression_method, and output_format_arrow_compression_method. We changed the default to zstd for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov).
In the new ClickHouse version, the functions geoDistance, greatCircleDistance, and greatCircleAngle will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes #58476. In previous versions, the function always used Float32. You can switch to the old behavior by setting geo_distance_returns_float64_on_float64_arguments to false or setting compatibility to 24.2 or earlier. #61848 (Alexey Milovidov). Co-authored with Geet Patel.
The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of #55186 and #45409. It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query: SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type. To disable the usage of in-memory data parts, do ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. #61127 (Alexey Milovidov).
Changed the column name from duration_ms to duration_microseconds in the system.zookeeper table to reflect the reality that the duration is in the microsecond resolution. #60774 (Duc Canh Le).
To increase compatibility with MySQL, the compatibility alias locate now accepts arguments (needle, haystack[, start_pos]) by default. The previous behavior (haystack, needle, [, start_pos]) can be restored by setting function_locate_has_mysql_compatible_argument_order = 0. #61092 (Robert Schulze).
Forbid SimpleAggregateFunction in ORDER BY of MergeTree tables (like AggregateFunction is forbidden, but they are forbidden because they are not comparable) by default (use allow_suspicious_primary_key to allow them). #61399 (Azat Khuzhin).
The Ordinary database engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes #52229. #56942 (shabroo).
Allow to attach parts from a different disk (using copy instead of hard link). #60112 (Unalian).
Size-capped Memory tables: controlled by their settings, min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep and max_rows_to_keep. #60612 (Jake Bamrah).
Separate limits on number of waiting and executing queries. Added new server setting max_waiting_queries that limits the number of queries waiting due to async_load_databases. Existing limits on number of executing queries no longer count waiting queries. #61053 (Sergei Trifonov).
Added a table system.keywords which contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. #51808 (Nikita Mikhaylov).
Add a new function, getClientHTTPHeader. This closes #54665. Co-authored with @lingtaolf. #61820 (Alexey Milovidov).
Add generate_series as a table function (compatibility alias for PostgreSQL to the existing numbers function). This function generates table with an arithmetic progression with natural numbers. #59390 (divanik).
A mode for topK/topkWeighed support mode, which return count of values and its error. #54508 (UnamedRus).
Added function toMillisecond which returns the millisecond component for values of typeDateTime or DateTime64. #60281 (Shaun Struwig).
Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make / redirect to the Play UI. #60390 (Alexey Milovidov).
If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns with the value 0.9 by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. #60255 (Alexey Milovidov).
Improve the performance of serialized aggregation method when involving multiple Nullable columns. #55809 (Amos Bird).
Lazy build JSON's output to improve performance of ALL JOIN. #58278 (LiuNeng).
Make HTTP/HTTPs connections with external services, such as AWS S3 reusable for all uses cases. Even when response is 3xx or 4xx. #58845 (Sema Checherinda).
Improvements to aggregate functions argMin / argMax / any / anyLast / anyHeavy, as well as ORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1 queries. #58640 (Raúl Marín).
Trivial optimization for column's filter. Peak memory can be reduced to 44% of the original in some cases. #59698 (李扬).
Execute multiIf function in a columnar fashion when the result type's underlying type is a number. #60384 (李扬).
Drain multiple connections in parallel when a distributed query is finishing. #60845 (lizhuoyu5).
Optimize data movement between columns of a Nullable number or a Nullable string, which improves some micro-benchmarks. #60846 (李扬).
Operations with the filesystem cache will suffer less from the lock contention. #61066 (Alexey Milovidov).
Optimize array join and other JOINs by preventing a wrong compiler's optimization. Close #61074. #61075 (李扬).
If a query with a syntax error contained COLUMNS matcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. #61543 (Alexey Milovidov).
Add a new analyzer pass to optimize the IN operator for a single value. #61564 (LiuNeng).
DNSResolver shuffles set of resolved IPs which is needed to uniformly utilize multiple endpoints of AWS S3. #60965 (Sema Checherinda).
Support parallel reading for Azure blob storage. This improves the performance of the experimental Azure object storage. #61503 (SmitaRKulkarni).
Add asynchronous WriteBuffer for Azure blob storage similar to S3. This improves the performance of the experimental Azure object storage. #59929 (SmitaRKulkarni).
Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. #61785 (Daniel Pozo Escalona).
Add a setting parallel_replicas_allow_in_with_subquery = 1 which allows subqueries for IN work with parallel replicas. #60950 (Nikolai Kochetov).
A change for the "zero-copy" replication: all zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. #57575 (Sema Checherinda).
Enable output_format_pretty_row_numbers by default. It is better for usability. #61791 (Alexey Milovidov).
In the previous version, some numbers in Pretty formats were not pretty enough. #61794 (Alexey Milovidov).
A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the SHOW CREATE TABLE query. #61795 (Alexey Milovidov).
Similarly to clickhouse-local, clickhouse-client will accept the --output-format option as a synonym to the --format option. This closes #59848. #61797 (Alexey Milovidov).
If stdout is a terminal and the output format is not specified, clickhouse-client and similar tools will use PrettyCompact by default, similarly to the interactive mode. clickhouse-client and clickhouse-local will handle command line arguments for input and output formats in a unified fashion. This closes #61272. #61800 (Alexey Milovidov).
Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting, output_format_pretty_highlight_digit_groups. #61802 (Alexey Milovidov).
Add ability to override initial INSERT settings via SYSTEM FLUSH DISTRIBUTED. #61832 (Azat Khuzhin).
Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. #61096 (Azat Khuzhin).
Support files without format extension in Filesystem database. #60795 (Kruglov Pavel).
Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. #60420 (豪肥肥). I appreciate if you will continue to write it correctly, e.g., JSON 😇, not Json 🤮, but we don't mind if you spell it as you prefer.
Added none_only_active mode for distributed_ddl_output_mode setting. #60340 (Alexander Tokmakov).
The advanced dashboard has slightly better colors for multi-line graphs. #60391 (Alexey Milovidov).
The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. #60692 (Alexey Milovidov).
While running the MODIFY COLUMN query for materialized views, check the inner table's structure to ensure every column exists. #47427 (sunny).
String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes #60726. #60727 (Alexey Milovidov).
Allow declaring Enums in the structure of external data for query processing (this is an immediate temporary table that you can provide for your query). #57857 (Duc Canh Le).
Consider lightweight deleted rows when selecting parts to merge, so the disk size of the resulting part will be estimated better. #58223 (Zhuo Qiu).
Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like _part_offset. #59033 (Amos Bird). Improved overall usability of virtual columns. Now it is allowed to use virtual columns in PREWHERE (it's worthwhile for non-const virtual columns like _part_offset). Now a builtin documentation is available for virtual columns as a comment of column in DESCRIBE query with enabled setting describe_include_virtual_columns. #60205 (Anton Popov).
Instead of using a constant key, now object storage generates key for determining remove objects capability. #59495 (Sema Checherinda).
Allow "local" as object storage type instead of "local_blob_storage". #60165 (Kseniia Sumarokova).
Parallel flush of pending INSERT blocks of Distributed engine on DETACH/server shutdown and SYSTEM FLUSH DISTRIBUTED (Parallelism will work only if you have multi-disk policy for a table (like everything in the Distributed engine right now)). #60225 (Azat Khuzhin).
An improvement for the MySQL compatibility protocol. The issue #57598 mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. #60338 (PapaToemmsn).
Keeper improvement: support leadership_expiry_ms in Keeper's settings. #60806 (Brokenice0415).
Always infer exponential numbers in JSON formats regardless of the setting input_format_try_infer_exponent_floats. Add setting input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects that allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. #60808 (Kruglov Pavel).
Add a flag for the full-sorting merge join algorithm to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. #60896 (loudongfeng).
Support detect output format by file exctension in clickhouse-client and clickhouse-local. #61036 (豪肥肥).
Update memory limit in runtime when Linux's CGroups value changed. #61049 (Han Fei).
Add the function toUInt128OrZero, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliases FROM_UNIXTIME and DATE_FORMAT (they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. #61114 (Alexey Milovidov).
Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example: GRANT SELECT ON *.* TO user1; REVOKE SELECT ON system.* FROM user1;. #61115 (pufit).
Now it's possible to specify the attribute merge="true" in config substitutions for subtrees <include from_zk="/path" merge="true">. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. #61299 (alesapin).
Use temporary_files_codec setting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only in partial_merge JOIN algorithm. #61456 (Maksim Kita).
Add a new setting max_parser_backtracks which allows to limit the complexity of query parsing. #61502 (Alexey Milovidov).
The real-time query profiler now works on AArch64. In previous versions, it worked only when a program didn't spend time inside a syscall. #60807 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in an official stable release)
Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) #60031 (Azat Khuzhin).
Something was wrong with the FINAL optimization, here is how the author describes it: "PartsSplitter invalid ranges for the same part". #60041 (Maksim Kita).
Something was wrong with Apache Hive, which is experimental and not supported. #60262 (shanfengp).
An improvement for experimental parallel replicas: force reanalysis if parallel replicas changed #60362 (Raúl Marín).
Fix usage of plain metadata type with new disks configuration option #60396 (Kseniia Sumarokova).
Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike #60451 (Kruglov Pavel).
Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. #59385 (Kruglov Pavel).
Add sanity check for number of threads and block sizes. #60138 (Raúl Marín).
Don't infer floats in exponential notation by default. Add a setting input_format_try_infer_exponent_floats that will restore previous behaviour (disabled by default). Closes #59476. #59500 (Kruglov Pavel).
Allow alter operations to be surrounded by parenthesis. The emission of parentheses can be controlled by the format_alter_operations_with_parentheses config. By default, in formatted queries the parentheses are emitted as we store the formatted alter operations in some places as metadata (e.g.: mutations). The new syntax clarifies some of the queries where alter operations end in a list. E.g.: ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN c cannot be parsed properly with the old syntax. In the new syntax the query ALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c) is obvious. Older versions are not able to read the new syntax, therefore using the new syntax might cause issues if newer and older version of ClickHouse are mixed in a single cluster. #59532 (János Benjamin Antal).
Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. #54901#60439 (pufit).
Try to detect file format automatically during schema inference if it's unknown in file/s3/hdfs/url/azureBlobStorage engines. Closes #50576. #59092 (Kruglov Pavel).
Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. #58486 (Julia Kartseva).
The user can now specify the template string directly in the query using format_schema_rows_template as an alternative to format_template_row. Closes #31363. #59088 (Shaun Struwig).
Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty convert_to_replicated file in table's data directory (/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/) and that table will be converted automatically on next server start. #57798 (Kirill).
Added query ALTER TABLE table FORGET PARTITION partition that removes ZooKeeper nodes, related to an empty partition. #59507 (Sergei Trifonov). This is an expert-level feature.
Implemented system.dns_cache table, which can be useful for debugging DNS issues. #59856 (Kirill Nikiforov).
The codec LZ4HC will accept a new level 2, which is faster than the previous minimum level 3, at the expense of less compression. In previous versions, LZ4HC(2) and less was the same as LZ4HC(3). Author: Cyan4973. #60090 (Alexey Milovidov).
Implemented system.dns_cache table, which can be useful for debugging DNS issues. New server setting dns_cache_max_size. #60257 (Kirill Nikiforov).
Support single-argument version for the merge table function, as merge(['db_name', ] 'tables_regexp'). #60372 (豪肥肥).
Support negative positional arguments. Closes #57736. #58292 (flynn).
Support specifying a set of permitted users for specific S3 settings in config using user key. #60144 (Antonio Andelic).
Added table function mergeTreeIndex. It represents the contents of index and marks files of MergeTree tables. It can be used for introspection. Syntax: mergeTreeIndex(database, table, [with_marks = true]) where database.table is an existing table with MergeTree engine. #58140 (Anton Popov).
Added function seriesOutliersDetectTukey to detect outliers in series data using Tukey's fences algorithm. #58632 (Bhavna Jindal). Keep in mind that the behavior will be changed in the next patch release.
Add function variantType that returns Enum with variant type name for each row. #59398 (Kruglov Pavel).
Support LEFT JOIN, ALL INNER JOIN, and simple subqueries for parallel replicas (only with analyzer). New setting parallel_replicas_prefer_local_join chooses local JOIN execution (by default) vs GLOBAL JOIN. All tables should exist on every replica from cluster_for_parallel_replicas. New settings min_external_table_block_size_rows and min_external_table_block_size_bytes are used to squash small blocks that are sent for temporary tables (only with analyzer). #58916 (Nikolai Kochetov).
Allow concurrent table creation in the Replicated database during adding or recovering a new replica. #59277 (Konstantin Bogdanov).
Implement comparison operator for Variant values and proper Field inserting into Variant column. Don't allow creating Variant type with similar variant types by default (allow uder a setting allow_suspicious_variant_types) Closes #59996. Closes #59850. #60198 (Kruglov Pavel).
Improve memory usage for primary key and some other operations. #60050 (Alexey Milovidov).
The tables' primary keys will be loaded in memory lazily on first access. This is controlled by the new MergeTree setting primary_key_lazy_load, which is on by default. This provides several advantages: - it will not be loaded for tables that are not used; - if there is not enough memory, an exception will be thrown on first use instead of at server startup. This provides several disadvantages: - the latency of loading the primary key will be paid on the first query rather than before accepting connections; this theoretically may introduce a thundering-herd problem. This closes #11188. #60093 (Alexey Milovidov).
Vectorized function dotProduct which is useful for vector search. #60202 (Robert Schulze).
Add short-circuit ability for dictGetOrDefault function. Closes #52098. #57767 (jsc0218).
Keeper improvement: cache only a certain amount of logs in-memory controlled by latest_logs_cache_size_threshold and commit_logs_cache_size_threshold. #59460 (Antonio Andelic).
Optimize if function when the input type is Map, the speed-up is up to ~10x. #59413 (李扬).
Improve performance of the Int8 type by implementing strict aliasing (we already have it for UInt8 and all other integer types). #59485 (Raúl Marín).
Optimize performance of sum/avg conditionally for bigint and big decimal types by reducing branch miss. #59504 (李扬).
Improve performance of SELECTs with active mutations. #59531 (Azat Khuzhin).
Optimized function isNotNull with AVX2. #59621 (李扬).
Improve ASOF JOIN performance for sorted or almost sorted data. #59731 (Maksim Kita).
The previous default value equals to 1 MB for async_insert_max_data_size appeared to be too small. The new one would be 10 MiB. #59536 (Nikita Mikhaylov).
Use multiple threads while reading the metadata of tables from a backup while executing the RESTORE command. #60040 (Vitaly Baranov).
Now if StorageBuffer has more than 1 shard (num_layers > 1) background flush will happen simultaneously for all shards in multiple threads. #60111 (alesapin).
When output format is Pretty format and a block consists of a single numeric value which exceeds one million, A readable number will be printed on table right. #60379 (rogeryk).
Added settings split_parts_ranges_into_intersecting_and_non_intersecting_final and split_intersecting_parts_ranges_into_layers_final. These settings are needed to disable optimizations for queries with FINAL and needed for debug only. #59705 (Maksim Kita). Actually not only for that - they can also lower memory usage at the expense of performance.
Running ALTER COLUMN MATERIALIZE on a column with DEFAULT or MATERIALIZED expression now precisely follows the semantics. #58023 (Duc Canh Le).
Enabled an exponential backoff logic for errors during mutations. It will reduce the CPU usage, memory usage and log file sizes. #58036 (MikhailBurdukov).
Add improvement to count the InitialQuery Profile Event. #58195 (Unalian).
Allow to define volume_priority in storage_configuration. #58533 (Andrey Zvonov).
Add support for the Date32 type in the T64 codec. #58738 (Hongbin Ma).
Settings for the Distributed table engine can now be specified in the server configuration file (similar to MergeTree settings), e.g. <distributed> <flush_on_detach>false</flush_on_detach> </distributed>. #59291 (Azat Khuzhin).
Retry disconnects and expired sessions when reading system.zookeeper. This is helpful when reading many rows from system.zookeeper table especially in the presence of fault-injected disconnects. #59388 (Alexander Gololobov).
Do not interpret numbers with leading zeroes as octals when input_format_values_interpret_expressions=0. #59403 (Joanna Hulboj).
At startup and whenever config files are changed, ClickHouse updates the hard memory limits of its total memory tracker. These limits are computed based on various server settings and cgroups limits (on Linux). Previously, setting /sys/fs/cgroup/memory.max (for cgroups v2) was hard-coded. As a result, cgroup v2 memory limits configured for nested groups (hierarchies), e.g. /sys/fs/cgroup/my/nested/group/memory.max were ignored. This is now fixed. The behavior of v1 memory limits remains unchanged. #59435 (Robert Schulze).
New profile events added to observe the time spent on calculating PK/projections/secondary indices during INSERT-s. #59436 (Nikita Taranov).
Allow to define a starting point for S3Queue with Ordered mode at the creation using a setting s3queue_last_processed_path. #59446 (Kseniia Sumarokova).
Made comments for system tables also available in system.tables in clickhouse-local. #59493 (Nikita Mikhaylov).
system.zookeeper table: previously the whole result was accumulated in memory and returned as one big chunk. This change should help to reduce memory consumption when reading many rows from system.zookeeper, allow showing intermediate progress (how many rows have been read so far) and avoid hitting connection timeout when result set is big. #59545 (Alexander Gololobov).
Now dashboard understands both compressed and uncompressed state of URL's #hash (backward compatibility). Continuation of #59124 . #59548 (Amos Bird).
Bumped Intel QPL (used by codec DEFLATE_QPL) from v1.3.1 to v1.4.0 . Also fixed a bug for polling timeout mechanism, as we observed in same cases timeout won't work properly, if timeout happen, IAA and CPU may process buffer concurrently. So far, we'd better make sure IAA codec status is not QPL_STS_BEING_PROCESSED, then fallback to SW codec. #59551 (jasperzhu).
Do not show a warning about the server version in ClickHouse Cloud because ClickHouse Cloud handles seamless upgrades automatically. #59657 (Alexey Milovidov).
After self-extraction temporary binary is moved instead copying. #59661 (Yakov Olkhovskiy).
Check for stack overflow in parsers even if the user misconfigured the max_parser_depth setting to a very high value. This closes #59622. #59697 (Alexey Milovidov). #60434
Unify XML and SQL created named collection behaviour in Kafka storage. #59710 (Pervakov Grigorii).
Allow uuid in replica_path if CREATE TABLE explicitly has it. #59908 (Azat Khuzhin).
Add column metadata_version of ReplicatedMergeTree table in system.tables system table. #59942 (Maksim Kita).
Keeper improvement: send only Keeper related metrics/events for Prometheus. #59945 (Antonio Andelic).
The dashboard will display metrics across different ClickHouse versions even if the structure of system tables has changed after the upgrade. #59967 (Alexey Milovidov).
Copy S3 file GCP fallback to buffer copy in case GCP returned Internal Error with GATEWAY_TIMEOUT HTTP error code. #60164 (Maksim Kita).
Short circuit execution for ULIDStringToDateTime. #60211 (Juan Madurga).
Added query_id column for tables system.backups and system.backup_log. Added error stacktrace to error column. #60220 (Maksim Kita).
Connections through the MySQL port now automatically run with setting prefer_column_name_to_alias = 1 to support QuickSight out-of-the-box. Also, settings mysql_map_string_to_text_in_show_columns and mysql_map_fixed_string_to_text_in_show_columns are now enabled by default, affecting also only MySQL connections. This increases compatibility with more BI tools. #60365 (Robert Schulze).
Fix a race condition in JavaScript code leading to duplicate charts on top of each other. #60392 (Alexey Milovidov).
If you want to run initdb scripts every time when ClickHouse container is starting you shoud initialize environment varible CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS. #59808 (Alexander Nikolaev).
Remove ability to disable generic clickhouse components (like server/client/...), but keep some that requires extra libraries (like ODBC or keeper). #59857 (Azat Khuzhin).
Use max_query_size from context in DDLLogEntry instead of hardcoded 4096 #60083 (Kruglov Pavel).
Fix inconsistent formatting of queries containing tables named table. Fix wrong formatting of queries with UNION ALL, INTERSECT, and EXCEPT when their structure wasn't linear. This closes #52349. Fix wrong formatting of SYSTEM queries, including SYSTEM ... DROP FILESYSTEM CACHE, SYSTEM ... REFRESH/START/STOP/CANCEL/TEST VIEW, SYSTEM ENABLE/DISABLE FAILPOINT. Fix formatting of parameterized DDL queries. Fix the formatting of the DESCRIBE FILESYSTEM CACHE query. Fix incorrect formatting of the SET param_... (a query setting a parameter). Fix incorrect formatting of CREATE INDEX queries. Fix inconsistent formatting of CREATE USER and similar queries. Fix inconsistent formatting of CREATE SETTINGS PROFILE. Fix incorrect formatting of ALTER ... MODIFY REFRESH. Fix inconsistent formatting of window functions if frame offsets were expressions. Fix inconsistent formatting of RESPECT NULLS and IGNORE NULLS if they were used after a function that implements an operator (such as plus). Fix idiotic formatting of SYSTEM SYNC REPLICA ... LIGHTWEIGHT FROM .... Fix inconsistent formatting of invalid queries with GROUP BY GROUPING SETS ... WITH ROLLUP/CUBE/TOTALS. Fix inconsistent formatting of GRANT CURRENT GRANTS. Fix inconsistent formatting of CREATE TABLE (... COLLATE). Additionally, I fixed the incorrect formatting of EXPLAIN in subqueries (#60102). Fixed incorrect formatting of lambda functions (#60012). Added a check so there is no way to miss these abominations in the future. #60095 (Alexey Milovidov).
Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6 #60359 (Kruglov Pavel).
Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments #60453 (Raúl Marín).
Fixed a minor bug that prevented distributed table queries sent from either KQL or PRQL dialect clients to be executed on replicas. #59674. #60470 (Alexey Milovidov) #59674 (Austin Kothig).
The setting print_pretty_type_names is turned on by default. You can turn it off to keep the old behavior or SET compatibility = '23.12'. #57726 (Alexey Milovidov).
The MergeTree setting clean_deleted_rows is deprecated, it has no effect anymore. The CLEANUP keyword for OPTIMIZE is not allowed by default (unless allow_experimental_replacing_merge_with_cleanup is enabled). #58316 (Alexander Tokmakov).
Enable various changes to improve the access control in the configuration file. These changes affect the behavior, and you check the config.xml in the access_control_improvements section. In case you are not confident, keep the values in the configuration file as they were in the previous version. #58584 (Alexey Milovidov).
Improve the operation of sumMapFiltered with NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values. -0 is now also treated as equal to 0; since 0 values are discarded, -0 values are discarded too. #58959 (Raúl Marín).
The function visibleWidth will behave according to the docs. In previous versions, it simply counted code points after string serialization, like the lengthUTF8 function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, set function_visible_width_behavior to 0, or set compatibility to 23.12 or lower. #59022 (Alexey Milovidov).
Kusto dialect is disabled until these two bugs will be fixed: #59037 and #59036. #59305 (Alexey Milovidov). Any attempt to use Kusto will result in exception.
More efficient implementation of the FINAL modifier no longer guarantees preserving the order even if max_threads = 1. If you counted on the previous behavior, set enable_vertical_final to 0 or compatibility to 23.12.
Implement Variant data type that represents a union of other data types. Type Variant(T1, T2, ..., TN) means that each row of this type has a value of either type T1 or T2 or ... or TN or none of them (NULL value). Variant type is available under a setting allow_experimental_variant_type. Reference: #54864. #58047 (Kruglov Pavel).
Certain settings (currently min_compress_block_size and max_compress_block_size) can now be specified at column-level where they take precedence over the corresponding table-level setting. Example: CREATE TABLE tab (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840)) ENGINE = MergeTree ORDER BY tuple();. #55201 (Duc Canh Le).
Allow to configure any kind of object storage with any kind of metadata type. #58357 (Kseniia Sumarokova).
Added null_status_on_timeout_only_active and throw_only_active modes for distributed_ddl_output_mode that allow to avoid waiting for inactive replicas. #58350 (Alexander Tokmakov).
Add function arrayShingles to compute subarrays, e.g. arrayShingles([1, 2, 3, 4, 5], 3) returns [[1,2,3],[2,3,4],[3,4,5]]. #58396 (Zheng Miao).
Added functions punycodeEncode, punycodeDecode, idnaEncode and idnaDecode which are useful for translating international domain names to an ASCII representation according to the IDNA standard. #58454 (Robert Schulze).
Add two settings output_format_compression_level to change output compression level and output_format_compression_zstd_window_log to explicitly set compression window size and enable long-range mode for zstd compression if output compression method is zstd. Applied for INTO OUTFILE and when writing to table functions file, url, hdfs, s3, and azureBlobStorage. #58539 (Duc Canh Le).
Automatically disable ANSI escape sequences in Pretty formats if the output is not a terminal. Add new auto mode to setting output_format_pretty_color. #58614 (Shaun Struwig).
Allow to read Bool values into String in JSON input formats. It's done under a setting input_format_json_read_bools_as_strings that is enabled by default. #58561 (Kruglov Pavel).
Added function seriesDecomposeSTL which decomposes a time series into a season, a trend and a residual component. #57078 (Bhavna Jindal).
Introduced MySQL Binlog Client for MaterializedMySQL: One binlog connection for many databases. #57323 (Val Doroshchuk).
Intel QuickAssist Technology (QAT) provides hardware-accelerated compression and cryptograpy. ClickHouse got a new compression codec ZSTD_QAT which utilizes QAT for zstd compression. The codec uses Intel's QATlib and Inte's QAT ZSTD Plugin. Right now, only compression can be accelerated in hardware (a software fallback kicks in in case QAT could not be initialized), decompression always runs in software. #57509 (jasperzhu).
Implementing the new way how object storage keys are generated for s3 disks. Now the format could be defined in terms of re2 regex syntax with key_template option in disc description. #57663 (Sema Checherinda).
Table system.dropped_tables_parts contains parts of system.dropped_tables tables (dropped but not yet removed tables). #58038 (Yakov Olkhovskiy).
Add settings max_materialized_views_size_for_table to limit the number of materialized views attached to a table. #58068 (zhongyuankai).
clickhouse-format improvements: support INSERT queries with VALUES; support comments (use --comments to output them); support --max_line_length option to format only long queries in multiline. #58246 (vdimir).
Attach all system tables in clickhouse-local, including system.parts. This closes #58312. #58359 (Alexey Milovidov).
Added FROM <Replicas> modifier for SYSTEM SYNC REPLICA LIGHTWEIGHT query. With the FROM modifier ensures we wait for fetches and drop-ranges only for the specified source replicas, as well as any replica not in zookeeper or with an empty source_replica. #58393 (Jayme Bird).
Added setting update_insert_deduplication_token_in_dependent_materialized_views. This setting allows to update insert deduplication token with table identifier during insert in dependent materialized views. Closes #59165. #59238 (Maksim Kita).
Added statement SYSTEM RELOAD ASYNCHRONOUS METRICS which updates the asynchronous metrics. Mostly useful for testing and development. #53710 (Robert Schulze).
Coordination for parallel replicas is rewritten for better parallelism and cache locality. It has been tested for linear scalability on hundreds of replicas. It also got support for reading in order. #57968 (Nikita Taranov).
Replace HTTP outgoing buffering based with the native ClickHouse buffers. Add bytes counting metrics for interfaces. #56064 (Yakov Olkhovskiy).
Large aggregation states of uniqExact will be merged in parallel in distrubuted queries. #59009 (Nikita Taranov).
Lower memory usage after reading from MergeTree tables. #59290 (Anton Popov).
More cache-friendly final implementation. Note on the behaviour change: previously queries with FINAL modifier that read with a single stream (e.g. max_threads = 1) produced sorted output without explicitly provided ORDER BY clause. This is no longer guaranteed when enable_vertical_final = true (and it is so by default). #54366 (Duc Canh Le).
Bypass extra copying in ReadBufferFromIStream which is used, e.g., for reading from S3. #56961 (Nikita Taranov).
Optimize array element function when input is Array(Map)/Array(Array(Num)/Array(Array(String))/Array(BigInt)/Array(Decimal). The previous implementations did more allocations than needed. The optimization speed up is up to ~6x especially when input type is Array(Map). #56403 (李扬).
Read column once while reading more than one subcolumn from it in compact parts. #57631 (Kruglov Pavel).
Rewrite the AST of sum(column + constant) function. This is available as an optimization pass for Analyzer #57853 (Jiebin Sun).
The evaluation of function match now utilizes skipping indices ngrambf_v1 and tokenbf_v1. #57882 (凌涛).
The evaluation of function match now utilizes inverted indices. #58284 (凌涛).
MergeTree FINAL does not compare rows from same non-L0 part. #58142 (Duc Canh Le).
Speed up iota calls (filling array with consecutive numbers). #58271 (Raúl Marín).
Improve the multiIf function performance when the type is Nullable. #57745 (KevinyhZou).
Add SYSTEM JEMALLOC PURGE for purging unused jemalloc pages, SYSTEM JEMALLOC [ ENABLE | DISABLE | FLUSH ] PROFILE for controlling jemalloc profile if the profiler is enabled. Add jemalloc-related 4LW command in Keeper: jmst for dumping jemalloc stats, jmfp, jmep, jmdp for controlling jemalloc profile if the profiler is enabled. #58665 (Antonio Andelic).
Added comments (brief descriptions) to all columns of system tables. There are several reasons for this: - We use system tables a lot, and sometimes it could be very difficult for developer to understand the purpose and the meaning of a particular column. - We change (add new ones or modify existing) system tables a lot and the documentation for them is always outdated. For example take a look at the documentation page for system.parts. It misses a lot of columns - We would like to eventually generate documentation directly from ClickHouse. #58356 (Nikita Mikhaylov).
Disable max_rows_in_set_to_optimize_join by default. #56396 (vdimir).
Add <host_name> config parameter that allows avoiding resolving hostnames in ON CLUSTER DDL queries and Replicated database engines. This mitigates the possibility of the queue being stuck in case of a change in cluster definition. Closes #57573. #57603 (Nikolay Degterinsky).
Increase load_metadata_threads to 16 for the filesystem cache. It will make the server start up faster. #57732 (Alexey Milovidov).
Add ability to throttle merges/mutations (max_mutations_bandwidth_for_server/max_merges_bandwidth_for_server). #57877 (Azat Khuzhin).
Replaced undocumented (boolean) column is_hot_reloadable in system table system.server_settings by (Enum8) column changeable_without_restart with possible values No, Yes, IncreaseOnly and DecreaseOnly. Also documented the column. #58029 (skyoct).
Cluster discovery supports setting username and password, close #58063. #58123 (vdimir).
Support query parameters in ALTER TABLE ... PART. #58297 (Azat Khuzhin).
Create consumers for Kafka tables on the fly (but keep them for some period - kafka_consumers_pool_ttl_ms, since last used), this should fix problem with statistics for system.kafka_consumers (that does not consumed when nobody reads from Kafka table, which leads to live memory leak and slow table detach) and also this PR enables stats for system.kafka_consumers by default again. #58310 (Azat Khuzhin).
Adding a setting max_estimated_execution_time to separate max_execution_time and max_estimated_execution_time. #58402 (Zhang Yifan).
Provide a hint when an invalid database engine name is used. #58444 (Bharat Nallan).
Add settings for better control of indexes type in Arrow dictionary. Use signed integer type for indexes by default as Arrow recommends. Closes #57401. #58519 (Kruglov Pavel).
Implement #58575 Support CLICKHOUSE_PASSWORD_FILE environment variable when running the docker image. #58583 (Eyal Halpern Shalev).
When executing some queries, which require a lot of streams for reading data, the error "Paste JOIN requires sorted tables only" was previously thrown. Now the numbers of streams resize to 1 in that case. #58608 (Yarik Briukhovetskyi).
When comparing a Float32 column and a const string, read the string as Float32 (instead of Float64). #58724 (Raúl Marín).
Improve S3 compatibility, add ECloud EOS storage support. #58786 (xleoken).
Allow KILL QUERY to cancel backups / restores. This PR also makes running backups and restores visible in system.processes. Also, there is a new setting in the server configuration now - shutdown_wait_backups_and_restores (default=true) which makes the server either wait on shutdown for all running backups and restores to finish or just cancel them. #58804 (Vitaly Baranov).
MySQL interface gained support for net_write_timeout and net_read_timeout settings. net_write_timeout is translated into the native send_timeout ClickHouse setting and, similarly, net_read_timeout into receive_timeout. Fixed an issue where it was possible to set MySQL sql_select_limit setting only if the entire statement was in upper case. #58835 (Serge Klochkov).
A better exception message while conflict of creating dictionary and table with the same name. #58841 (Yarik Briukhovetskyi).
Make sure that for custom (created from SQL) disks ether filesystem_caches_path (a common directory prefix for all filesystem caches) or custom_cached_disks_base_directory (a common directory prefix for only filesystem caches created from custom disks) is specified in server config. custom_cached_disks_base_directory has higher priority for custom disks over filesystem_caches_path, which is used if the former one is absent. Filesystem cache setting path must lie inside that directory, otherwise exception will be thrown preventing disk to be created. This will not affect disks created on an older version and server was upgraded - then the exception will not be thrown to allow the server to successfully start). custom_cached_disks_base_directory is added to default server config as /var/lib/clickhouse/caches/. Closes #57825. #58869 (Kseniia Sumarokova).
MySQL interface gained compatibility with SHOW WARNINGS/SHOW COUNT(*) WARNINGS queries, though the returned result is always an empty set. #58929 (Serge Klochkov).
Display word-descriptive log level while enabling structured log formatting in json. #58936 (Tim Liou).
MySQL interface gained support for CAST(x AS SIGNED) and CAST(x AS UNSIGNED) statements via data type aliases: SIGNED for Int64, and UNSIGNED for UInt64. This improves compatibility with BI tools such as Looker Studio. #58954 (Serge Klochkov).
Change working directory to the data path in docker container. #58975 (cangyin).
Added setting for Azure Blob Storage azure_max_unexpected_write_error_retries , can also be set from config under azure section. #59001 (SmitaRKulkarni).
Allow to ignore schema evolution in the Iceberg table engine and read all data using schema specified by the user on table creation or latest schema parsed from metadata on table creation. This is done under a setting iceberg_engine_ignore_schema_evolution that is disabled by default. Note that enabling this setting can lead to incorrect result as in case of evolved schema all data files will be read using the same schema. #59133 (Kruglov Pavel).
Prohibit mutable operations (INSERT/ALTER/OPTIMIZE/...) on read-only/write-once storages with a proper TABLE_IS_READ_ONLY error (to avoid leftovers). Avoid leaving left-overs on write-once disks (format_version.txt) on CREATE/ATTACH. Ignore DROP for ReplicatedMergeTree (so as for MergeTree). Fix iterating over s3_plain (MetadataStorageFromPlainObjectStorage::iterateDirectory). Note read-only is web disk, and write-once is s3_plain. #59170 (Azat Khuzhin).
Fix bug in the experimental _block_number column which could lead to logical error during complex combination of ALTERs and merges. Fixes #56202. Replaces #58601. #59295 (alesapin).
Play UI understands when an exception is returned inside JSON. Adjustment for #52853. #59303 (Alexey Milovidov).
/binary HTTP handler allows to specify user, host, and optionally, password in the query string. #59311 (Alexey Milovidov).
Support the FORMAT clause in BACKUP and RESTORE queries. #59338 (Vitaly Baranov).
Function concatWithSeparator now supports arbitrary argument types (instead of only String and FixedString arguments). For example, SELECT concatWithSeparator('.', 'number', 1) now returns number.1. #59341 (Robert Schulze).
Improve aliases for clickhouse binary (now ch/clickhouse is clickhouse-local or clickhouse depends on the arguments) and add bash completion for new aliases. #58344 (Azat Khuzhin).
Add settings changes check to CI to check that all settings changes are reflected in settings changes history. #58555 (Kruglov Pavel).
Save the whole fuzzer.log as an archive instead of the last 100k lines. tail -n 100000 often removes lines with table definitions. Example:. #58821 (Dmitry Novik).
Enable Rust on macOS with Aarch64 (this will add fuzzy search in client with skim and the PRQL language, though I don't think that are people who host ClickHouse on darwin, so it is mostly for fuzzy search in client I would say). #59272 (Azat Khuzhin).
Fix aggregation issue in mixed x86_64 and ARM clusters #59132 (Harry Lee).
Bug Fix (user-visible misbehavior in an official stable release)
Add join keys conversion for nested LowCardinality #51550 (vdimir).
Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) #56132 (Kruglov Pavel).
Fix a bug with projections and the aggregate_functions_null_for_empty setting during insertion. #56944 (Amos Bird).
Fix working with read buffers in StreamingFormatExecutor #57438 (Kruglov Pavel).
Ignore MVs with dropped target table during pushing to views #57520 (Kruglov Pavel).
Eliminate possible race between ALTER_METADATA and MERGE_PARTS #57755 (Azat Khuzhin).
Fix the expressions order bug in group by with rollup #57786 (Chen768959).
A fix for the obsolete "zero-copy" replication feature: Fix lost blobs after dropping a replica with broken detached parts #58333 (Alexander Tokmakov).
Allow users to work with symlinks in user_files_path #58447 (Duc Canh Le).
Fix a crash when graphite table does not have an agg function #58453 (Duc Canh Le).
Delay reading from StorageKafka to allow multiple reads in materialized views #58477 (János Benjamin Antal).
A fix for experimental inverted indices (don't use in production): DROP INDEX of inverted index now removes all relevant files from persistence #59040 (mochi).