CHANGELOG.md 12 KB
Newer Older
1
2
# Changelog

3
## 0.4.3 (2020-11-13)
4
5
6
7
8

### Bug fixes
- The mechanism meant to provide an early warning for potential ``MANIFEST``
corruption was flaky, and would sometimes report a problem where none existed.

Georgios Bitzes's avatar
Georgios Bitzes committed
9
### New features
10
- Implementation of an optional part of raft, pre-vote. This should prevent partitioned,
11
12
13
or otherwise flaky rejoining servers from triggering unnecessary and disruptive elections.
A node will first issue an experimental voting round before advancing its term, and start campaigning
for earnest only if it has a good chance of winning.
Georgios Bitzes's avatar
Georgios Bitzes committed
14
- Ability to demote a full node to observer through command ``raft-demote-to-observer``.
15
- Print warnings in the logs whenever write-stalls are triggered.
Georgios Bitzes's avatar
Georgios Bitzes committed
16
17

### Improvements
18
- Show resilvering progress in ``raft-info``.
Georgios Bitzes's avatar
Georgios Bitzes committed
19
20
- Checkpoint creation through ``quarkdb-checkpoint`` will now fail if a different
physical filesystem is specified.
21
- RPMs now available for CentOS 8.
22
23
- Print explicit warnings in the log in case of write stalling.
- Reduce default trimming batch size to 200k.
24
- Add in-memory cache for leases to significantly speed up all lease-related operations.
25
26
27

Many thanks to Franck Eyraud (JRC) for the bug report concerning erroneous ``MANIFEST``-related
warning.
28

29
## 0.4.2 (2020-03-12)
30

31
### Bug fixes
32
33
- Under complicated conditions (follower is very far behind leader + network instabilities),
replication towards a particular follower could become stuck. (to workaround, restart leader node)
34
35
36
37
- Running ``DEL`` on a lease key would cause all nodes in a cluster to crash
with an assertion. ``DEL`` will now simply release the given lease, as if
``lease-release`` had been called.

38
39
### New features
- Implement command ``quarkdb-verify-checksum`` for manually running a full checksum scan.
40
41
42
- Addition of ``quarkdb-validate-checkpoint`` tool for ensuring that a given
checkpoint is valid -- useful to run in backup scripts before streaming a given
checkpoint for long-term storage.
43

44
45
### Improvements
- Security hardening of the redis parser for unauthenticated clients.
46
- Package and distribute ``quarkdb-ldb`` tool based on the one provided by RocksDB.
47
48
- Attempt to detect potential ``MANIFEST`` corruption early by measuring mtime lag
compared to newest SST file.
49

50
51
52
Many thanks to Crystal Chua (AARNet) for the bug report and all support offered
related to RocksDB's ``MANIFEST`` corruption issue, as well as to Pete Eby (ORNL)
for finding and reporting the bug causing replication to become stuck.
53

54
## 0.4.1 (2020-01-17)
55

56
57
58
59
60
### Bug fixes
- Fixed ability to subscribe to multiple channels with one command, when push types
are active. Previously, the server would erroneously send one "OK" response per
channel subscribed, breaking QClient.

61
62
63
64
65
66
67
68
### New features
- Possibility to choose between three different journal fsync policies through
``RAFT-SET-FSYNC-POLICY`` command.
- Implementation of ``CLIENT GETNAME``, and automatic tagging of intercluster
connections.

### Improvements
- Automatic fsync of the raft journal once per second.
69
70
- Better cluster resilience in case of sudden machine powercuts.

71
Many thanks to Franck Eyraud (JRC) for the bug reports relating to sudden poweroff, and valuable discussion on fsync behavior.
72

73
74
75
76
77
## 0.4.0 (2019-12-06)

### Bug fixes
- Locality hints ending with a pipe symbol (|) could subsequently trigger an
assertion and crash when encountered during ``LHSCAN``, due to faulty key parsing code.
78
79
The pipe symbol (|) has special meaning inside internal QuarkDB keys, and is used
to escape field separators (#).
80
81
82

### New features
- Addition of ``quarkdb-server`` binary to allow running QDB without XRootD.
83
- Add support for ``CLIENT SETNAME`` command as aid in debugging.
84
85
86
87
88
89
90
91

### Improvements
- Improvements to replication behaviour when one of the followers is very far behind the leader.
Previously, an excessive number of entries were kept in the request pipeline, which
wasted memory and could potentially trigger OOM.
- Switch to CLI11 for command line argument parsing.
- Upgrade rocksdb dependency to v6.2.4.

92
## 0.3.9 (2019-09-20)
Georgios Bitzes's avatar
Georgios Bitzes committed
93
94

### Bug fixes
95
- ``DEQUE-SCAN-BACK`` was returning the wrong cursor to signal end of
Georgios Bitzes's avatar
Georgios Bitzes committed
96
iteration: ``next:0`` while it should have been ``0``.
97
98
99
- A race condition was sometimes causing elections to fail spuriously.
Establishing a stable quorum would occasionally require slightly more election
rounds than it should have.
Georgios Bitzes's avatar
Georgios Bitzes committed
100
101

### New features
102
- Implementation of health indicators through ``QUARKDB-HEALTH`` command.
103
104
- Added support for RESPv3 push types, activated on a per-client basis through
``ACTIVATE-PUSH-TYPES`` command.
Georgios Bitzes's avatar
Georgios Bitzes committed
105
106
- Implementation of ``LHLOCDEL`` command for conditionally deleting a locality hash
field, only if the provided hint matches.
107
- Add convenience command ``DEQUE-CLEAR``.
108
109
- Add support for ``MATCHLOC`` in ``LHSCAN``, used to filter out results based
on locality hint.
Georgios Bitzes's avatar
Georgios Bitzes committed
110
111
112
- Add ``RECOVERY-SCAN`` command for scanning through complete keyspace, including
internal rocksdb keys.
- Add tool ``quarkdb-sst-inspect`` to allow low-level inspection of SST files.
113
114
- Add command ``RAFT-JOURNAL-SCAN`` to make searching through the contents of the
raft journal easier.
Georgios Bitzes's avatar
Georgios Bitzes committed
115
116

### Improvements
117
118
119
120
121
- Protection for a strange case of corruption which brought down a development
test cluster. (last-applied jumped ahead of commit-index by 1024, causing all
writes to stall). From now on, similar kind of corruption should only take out
a single node, and not spread to the entire cluster.
- ``KEYS`` is now implemented in terms of ``SCAN``, making prefix matching of the
122
keyspace just as efficient as with ``SCAN``. (Note: The use of ``KEYS`` is still
123
124
generally discouraged due to potentially huge response size)
- Removed unused tool ``quarkdb-scrub``.
125

Georgios Bitzes's avatar
Georgios Bitzes committed
126

127
128
129
## 0.3.8 (2019-05-27)
- Prevent elections from hanging on the TCP timeout when one of the member hosts
is dropping packets, which could bring down an otherwise healthy cluster.
130
131
- Prevent crashing when ``LHSCAN`` is provided with a cursor missing the field
component.
132
133
- Make request statistics available through ``command-stats`` command.
- Addition of configuration file path to ``quarkdb-info``.
134
- Print simple error message when the given path to quarkdb-create already exists,
135
instead of a stacktrace.
136

137
138
139
140
141
## 0.3.7 (2019-04-24)
- Heavy use of lease commands could cause performance degradation and latency
spikes, due to accumulation of long-lived deletion tombstones on expiration
events. Starting from this release, such tombstones should disappear much
more quickly without accumulating.
142
143
144
- Fixed regression introduced in 0.3.6, which made it impossible to create new
bulkload nodes. (Note: to workaround, delete directory ``/path/to/bulkload/current/state-machine``
right after running ``quarkdb-create``)
145
146
147
148
149
- Addition of ``LEASE-GET-PENDING-EXPIRATION-EVENTS`` command to list all
pending lease expiration events, and ``RAW-SCAN-TOMBSTONES`` to inspect tombstones
as an aid in debugging.
- Expanding a cluster is now easier, no need to pass ``--nodes`` to quarkdb-create
when creating a new node for an established cluster.
150

151
152
## 0.3.6 (2019-03-21)
- Improved memory management and recycling, putting less pressure on the global
153
154
155
156
memory allocator.
- Addition of pub/sub support, commands implemented: ``PUBLISH``, ``SUBSCRIBE``,
  ``PSUBSCRIBE``, ``UNSUBSCRIBE``, ``PUNSUBSCRIBE``
- Addition of ``--steal-state-machine`` flag to ``quarkdb-create`` to make
157
  transition out of bulkload mode easier and less error-prone.
Georgios Bitzes's avatar
Georgios Bitzes committed
158
- Updated rocksdb to v5.18.3.
159

160
161
162
163
164
165
## 0.3.5 (2018-11-28)
- Updated rocksdb dependency to v5.17.2.
- Improved output of backup command ``raft-checkpoint``. The generated directory
can be used directly to spin-up a full QuarkDB node, without manual tinkering.
Command aliased to ``quarkdb-checkpoint``.
- Removed flood of ``attempting connection to .. `` messages when some node is unavailable.
166
167
- Light refactoring, more widespread use of ``std::string_view``, which paves the
way for certain performance optimizations in the future.
168
- Fixed several flaky tests.
Georgios Bitzes's avatar
Georgios Bitzes committed
169

170
## 0.3.4 (2018-10-09)
171
- Updated rocksdb dependency to v5.15.10.
Georgios Bitzes's avatar
Georgios Bitzes committed
172
- Added `TYPE` command.
173
174
- A read-only MULTI immediatelly after a read-write MULTI could
  cause the cluster to crash.
175
176
177
- Added command `LHSCAN` for scanning through a locality hash.
- Added convenience command in recovery mode for performing forced membership updates.
- It's now possible to issue one-off commands from the recovery tool, without setting up a server.
178

179
180
181
182
183
184
185
186
## 0.3.3 (2018-09-14)

### Added
- Ability to run QuarkDB in raft mode with only a single node. This allows starting
with a single node, and growing the cluster in the future if needed, which is
cumbersome to do in standalone mode.
- Added `quarkdb-version` command, which simply returns the current quarkdb version.
- Commands `deque-scan-back`, `deque-trim-front`.
187
188
189
190
191
192
193
194
195

### Changed
- Dropped list commands, as the underlying implementation makes it impossible to support all list commands found in official redis. (most notably `linsert`)

  Still, the old implementation makes for an excellent deque. Renamed list commands `lpush`, `lpop`, `rpush`, `rpop`, `rlen` to `deque-push-front`, `deque-pop-front`, `deque-push-back`, `deque-pop-back`, `deque-len`. The underlying data format did not change, only the command names.

  This makes it possible to implement lists properly in the future.

  No-one has been using the list operations, which gives us the opportunity to change the command names.
196
- Minor code and performance improvements.
197

198
## 0.3.2 (2018-08-14)
199
200
201
202
203
204
205
206
### Added
- Protection against 1-way network partitions, in which a cluster node
  is able to establish TCP connections to others, but the rest cannot do the same.

  This resulted in cluster disruption as the affected node would not be receiving
  heartbeats, but could still repeatedly attempt to get elected.

  From now on, a node which has been vetoed will abstain from starting election
207
208
209
  rounds until it has received fresh heartbeats since receiving that veto. This
  will prevent a 1-way network partitioned node from causing extensive cluster
  disruption.
210

211
212
213
### Fixed
- A couple of minor memory leaks.

Georgios Bitzes's avatar
Georgios Bitzes committed
214
## 0.3.1 (2018-08-03)
215
216
217
### Added
- Command `hclone` for creating identical copies of entire hashes.

218
219
220
### Fixed
- An `EXEC` when not inside a `MULTI` would cause a crash.

221
222
223
224
225
226
## 0.2.9 (2018-07-16)
### Added
- Commands `convert-string-to-int`, `convert-int-to-string` to convert between
  binary-string-encoded integers and human-readable-ASCII encoding. Meant for
  interactive use only, to make life easier during low-level debugging when
  needing to edit low-level rocksdb keys, where binary-encoded integers are used.
227

228
229
230
### Changed
- Refactoring of transactions, we no longer pack / unpack a transaction into a single request within the same node, saving
  CPU cycles.
231
- Explicitly block zero-sized strings when parsing the redis protocol, print appropriate warning.
232
233
234
235
236
237
238

### Fixed
- In certain cases, such as when redirecting or reporting unavailability for pipelined writes, fewer
  responses might be provided than expected, causing the client connection to hang. This did not affect
  QClient when redirects are active, as it would shut the connection down and retry upon reception of
  the first such response.

239
## 0.2.8 (2018-07-04)
240
241
242
243
244
245
246
247
### Added
- Support for leases, which can be used as locks with timeouts, allowing QuarkDB to serve as a distributed lock manager.
- Commands `lease-acquire`, `lease-get`, `lease-release`.

### Changed
- A newly elected leader now stalls writers in addition to readers, until its leadership marker entry in the raft journal has been committed and applied.

### Fixed
248
- Sockets and threads from closed connections were not being cleaned due to misunderstanding how XRootD handles connection shutdown. Each connection would hog a socket and a deadlocked thread on the server forever. (oops)
249
250
251
252
253
254
255
256
- A particularly rare race condition was able to trigger an assertion in the Raft subsystem, causing the current cluster leader to crash.

## 0.2.7 (2018-06-22)
### Added
- Updated rocksdb dependency to 5.13.4.

### Fixed
- Certain unlikely sequences of pipelined writes were able to trigger an assertion and bring a cluster down, when part of a transaction. Without that assertion, the commands would have left ghost key-value pairs in the rocksdb keyspace.