Before starting to talk about partitions, let me explain the situation that led to me researching them. I have been given a system whereby all data is treated at a "dataset" level, where each dataset is put into its own table. The reason the system was developed like this in the first place is because it allowed the tables to be small enough to be held in memory when performing queries. This meant that the database performance was consistently fast, even as the amount of data grew rapidly.
The problem with such a system is that all datasets, and thus tables, are expected to have the exact same schema, which is hard to edit. Imagine how hard the simple act of adding or removing a column is in such a system. This has resulted in some of the table schemas diverging with different indexes, and in some extreme cases, some tables have extra columns. The other slight problem is that SHOW TABLESis now pretty unusable.
How Will Partitions Help?
Partitions are essentially tables in the underlying system. Using partitions will allow us to consolidate the thousands of tables into a single table schema with a new "dataset_id" column telling us which dataset each row belongs to. The partitions will keep queries fast by allowing us to load only a section of the entire table into memory. Loading the entire consolidated table into memory is impossible as it has over 7 million rows and growing fast.
The reason I mention the use case above is because partitions will not help you unless your queries operate only on the data within a partition, in this case the dataset_id. Having queries that operate over multiple partitions will be slow. This means my queries will need to use the dataset_idin the conditional clauses. Partitioning is also unlikely to be of much help unless you have large tables. I wouldn’t even consider using partitions if your tables only contain a few thousand rows.
Partitioning by a column requires that column to be within the primary key. Lots of users create tables that have an auto incrementing primary key called id. If this is the situation you are in, then the solution is to add the column within the primary key, dataset_idin my case. If you want to partition an existing table, then you would need to execute:
ALTER TABLE `[table name]` DROP PRIMARY KEY, ADD PRIMARY KEY (`id` ,`dataset_id`);ALTER TABLE `[table name] `partition by KEY(`dataset_id`)partitions [number of partitions];
Alternatively, if you are creating the table for the first time, then an example is shown below
CREATE TABLE test123 (id INT UNSIGNED AUTO_INCREMENT,dataset_id INT UNSIGNED NOT NULL,field1 INT UNSIGNED NOT NULL,field2 varchar(255) NOT NULL,PRIMARY KEY (id, dataset_id))PARTITION BY KEY(dataset_id)partitions [number of partitions]; Auto Increment Still Behaves The Same
The idfield will still increment by 1 even though you are using a combined primary key.
Number of Partitions, Key() and Hash() Partitioning
In the examples above, I have used the KEYkeyword for creating my partitions. This will result in the partition lookup being an MD5 of the values of the columns specified within the braces. The only difference between KEYand HASHis that HASHuses a user defined expression .
There are other partitioning types, but this was most appropriate for my situation. If you are dealing with a massive log table, then you may wish to partition by the timestamp of the log, in which case a range partitionmay be appropriate so that queries for the most recent logs will result in going to the same partition, which is likely to be kept in memory.
When creating a partitioned table, you will specify the number of partitions which will be immediately created. Data will be put into the partition based on the partitioning type/algorithm. By using a KEY/ HASH, two dataset_idvalues that are very close in value, e.g. 2and 3are likely to be in differentpartitions, unlike a RANGEpartition type. This is actually a good thing because it ensures that my data will spread fairly evenly across the partitions. Consider the alternative where data is stored linearly across partitions so that 1and 2are likely to be on the same partition and 50000and 50001are likely to be stored together on a different one. In such a scenario, fragmentation of the indexes (due to users deleting datasets) would result in an uneven distribution with some partitions containing a lot more data than others. Also since my system just auto increments a unique dataset id for each dataset, my first partition would have to become completely full and become a massive table before I started using the next partition, resulting in nullifying the entire point of partitioning in the first place (smaller tables that can be loaded into memory). In a situation where I have 100 tables for an unsigned integer, the first partition would hold dataset_idvalues between 0 and 42,949,672 ( the maximum value of an unsigned INT is 4294967295). All of my existing dataset_ids would reside within that one partition.
The maximum number of partitions you can have in MySQL prior to 5.6.7was 1024. 5.6.7+can have up to 8192.
chrismoos.com – MYSQL AND PARTITIONING TABLES WITH MILLIONS OF ROWS
Axure汉化版已经发布，版本号Axure 220.127.116.119，下面是截图效果 Axure汉化版文件下载地址：Axure汉化补丁 Axure RP pro 18.104.22.1689 下载地址注册用户名：Axure 序列...
BI中文站 6月7日报道 艾默生·斯帕茨(Emerson Spartz)今年28岁，已婚，是Spartz Inc公司的首席执行官。Spartz Inc是一个网站媒体帝国，旗下的30多家网站专门发布有趣、励志和让人感到不可思议的文章和帖子，其分享量非常...
- First Step on MySQL Learning Path
- MySQL热备份工具innobackupex介绍及应用 —— 简介及安装
- MySQL 二进制日志格式基础(一)