Tribblix: manual page: gdal-vector-partition.1

GDAL-VECTOR-PARTITION(1) GDAL GDAL-VECTOR-PARTITION(1)

NAME

gdal-vector-partition - Partition a vector dataset into multiple
files

Added in version 3.12.

SYNOPSIS

Usage: gdal vector partition [OPTIONS] <INPUT> <OUTPUT>

Partition a vector dataset into multiple files.

Positional arguments:
-i, --input <INPUT> Input vector datasets [required]
-o, --output <OUTPUT> Output directory [required]

Common Options:
-h, --help Display help message and exit
--json-usage Display usage as JSON document and exit
--config <KEY>=<VALUE> Configuration option [may be repeated]
-q, --quiet Quiet mode (no progress bar)

Options:
--overwrite Whether overwriting existing output is allowed
Mutually exclusive with --append
--append Whether appending to existing layer is allowed
Mutually exclusive with --overwrite
-f, --of, --format, --output-format <OUTPUT-FORMAT> Output format
--co, --creation-option <KEY>=<VALUE> Creation option [may be repeated]
--lco, --layer-creation-option <KEY>=<VALUE> Layer creation option [may be repeated]
--field <FIELD> Field(s) on which to partition [may be repeated] [required]
--scheme <SCHEME> Partitioning scheme. SCHEME=hive|flat (default: hive)
--pattern <PATTERN> Filename pattern ('part_%010d' for scheme=hive, '{LAYER_NAME}_{FIELD_VALUE}_%010d' for scheme=flat)
--feature-limit <FEATURE-LIMIT> Maximum number of features per file
--max-file-size <MAX-FILE-SIZE> Maximum file size (MB or GB suffix can be used)
--omit-partitioned-field Whether to omit partitioned fields from target layer definition
--skip-errors Skip errors when writing features

Advanced Options:
--if, --input-format <INPUT-FORMAT> Input formats [may be repeated]
--oo, --open-option <KEY>=<VALUE> Open options [may be repeated]

DESCRIPTION

gdal vector partition dispatches features into different files,
depending on the values the feature take on a subset of fields
specified by the user.

Two partitioning schemes are available:

+o hive, corresponding to Apache Hive partitioning, is the default
one.

Each partitioning field corresponds to a nested directory. Let's
consider a layer with fields "continent" and "country", chosen as
partitioning fields. All features where "continent" evaluates to
"Europe" and "country" evaluates to "France", will be written in
the "continent=Europe/country=France/" subdirectory of the output
directory.

NULL values for partitioning fields are encoded as
__HIVE_DEFAULT_PARTITION__ in the directory name. Non-ASCII
characters, space, equal sign, or characters not compatible with
directory name constraints are percent-encoded (e.g. %20 for
space).

+o flat where files are written directly under the output directory
using a default filename pattern of
{LAYER_NAME}_{FIELD_VALUE}_%10d.

By default, the format of the input dataset will be used for the
output, if it can be determined and the input driver supports
writing. Otherwise, --format must be used.

gdal vector partition can be used as the last step of a pipeline.

The following options are available:

Standard options

--output <OUTPUT-DIRECTORY>
Root of the output directory. [required]

--field <FIELD-NAME>
Fields(s) on which to partition. [required]

Only fields of type String, Integer and Integer64 are allowed.
The order into which fields are specified matter to determine
the directory hierarchy.

-f, --of, --format, --output-format <OUTPUT-FORMAT>
Which output vector format to use. Allowed values may be given
by gdal --formats | grep vector | grep rw | sort

--co, --creation-option <NAME>=<VALUE>
Many formats have one or more optional dataset creation
options that can be used to control particulars about the file
created. For instance, the GeoPackage driver supports creation
options to control the version.

May be repeated.

The dataset creation options available vary by format driver,
and some simple formats have no creation options at all. A
list of options supported for a format can be listed with the
--formats command line option but the documentation for the
format is the definitive source of information on driver
creation options. See Vector drivers format specific
documentation for legal creation options for each format.

Note that dataset creation options are different from layer
creation options.

--lco, --layer-creation-option <NAME>=<VALUE>
Many formats have one or more optional layer creation options
that can be used to control particulars about the layer
created. For instance, the GeoPackage driver supports layer
creation options to control the feature identifier or geometry
column name, setting the identifier or description, etc.

May be repeated.

The layer creation options available vary by format driver,
and some simple formats have no layer creation options at all.
A list of options supported for a format can be listed with
the --formats command line option but the documentation for
the format is the definitive source of information on driver
creation options. See Vector drivers format specific
documentation for legal creation options for each format.

Note that layer creation options are different from dataset
creation options.

--overwrite
Allow program to overwrite existing target file or dataset.
Otherwise, by default, gdal errors out if the target file or
dataset already exists.

--append
Whether the output directory must be opened in append mode.
Implies that it already exists and that the output format
supports appending.

This mode is useful when adding new features to an already an
existing partitioned dataset.

--scheme hive|flat
Partitioning scheme. Defaults to hive.

--pattern <PATTERN>
Filename pattern. User chosen string, with substitutions for:

+o {LAYER_NAME}, when found, is substituted with the layer name
(percent encoded where needed).

+o {FIELD_VALUE}, when found, is substituted with the
partitioning field value (percent encoded where needed). If
several partitioning fields are used, each value is
separated by underscore (_). Empty strings are substituted
with __EMPTY__ and null fields with __NULL__.

+o %[0?][0-9]?[0]?d: C-style integer formatter for the part
number. Valid values are for example %d or %05d. One and
only one part number specifier must be present in the
pattern.

Default values for the pattern are part_%010d for the hive
scheme, and {LAYER_NAME}_{FIELD_VALUE}_%010d for the flat
scheme.`

--feature-limit <FEATURE-LIMIT>
Maximum number of features per file. By default, unlimited. If
the limit is exceeded, several parts are created.

--max-file-size <MAX-FILE-SIZE>
Maximum file size (MB or GB suffix can be used). By default,
unlimited. If the limit is exceeded, several parts are
created.

Note that the maximum file size is used as a hint, and might
not be strictly respected, because the evaluation of the file
size corresponding to a feature is based on a heuristics, as
the file size itself cannot be reliably used when it is under
writing. In particular, the heuristics does not assume any
compression, so for compressed formats, the actual size of a
part can be significantly smaller than the specified limit.

--omit-partitioned-field
Whether to omit partitioned fields from the target layer
definition. Automatically set for Parquet output format and
Hive partitioning.

--skip-errors
Whether failures to write feature(s) should be ignored. Note
that this option sets the size of the transaction unit to one
feature at a time, which may cause severe slowdown when
inserting into databases.

Advanced options

--oo, --open-option <NAME>=<VALUE>
Dataset open option (format specific).

May be repeated.

--if, --input-format <format>
Format/driver name to be attempted to open the input file(s).
It is generally not necessary to specify it, but it can be
used to skip automatic driver detection, when it fails to
select the appropriate driver. This option can be repeated
several times to specify several candidate drivers. Note that
it does not force those drivers to open the dataset. In
particular, some drivers have requirements on file extensions.

May be repeated.

EXAMPLES

Example 1: Create a partition based on the "continent" and "country"
fields

$ gdal vector partition world_cities.gpkg out_directory --field continent,country --format Parquet

Example 2: Create a partition based on the "country" field, filtering on
cities with population bigger than 1 million, with a flat
partitioning scheme

$ gdal pipeline ! read world_cities.gpkg ! filter --where "pop > 1e6" ! partition out_directory --field country --format GPKG --scheme flat

AUTHOR

Even Rouault <even.rouault@spatialys.com>

COPYRIGHT

1998-2026

March 20, 2026 GDAL-VECTOR-PARTITION(1)