C string is used only for literal constant in Redis. For the situation that the string can be modified, Redis provide a Simple Dynamic String(SDS). The key in Redis is actually a SDS, and if you insert a key-value with SET msg "hello world"
, a SDS instance will be constructed for the key. In this case, the value is also a SDS instance with the content of “hello world”.
Besides string, SDS is also used for buffer which can serve the AOF module and client’s input.
This post will dive into the implementation of SDS comparing with classic C string.
The code analyzed in this post is based on 5.0.0
Definition
SDS is defined in sds.h
. The struct changed after 3.2 to save memory. Now, Redis offers five types of SDS: sdshdr5
, sdshdr8
, sdshdr16
, sdshdr32
, sdshdr64
. The length of content determines which struct will be used.
#define SDS_TYPE_5 0
#define SDS_TYPE_8 1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3
typedef char *sds;
/* Note: sdshdr5 is never used, we just access the flags byte directly.
* However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
Allow me to skip sdshr5
which will never be used. The other four structures all incorporate the following member:
buf
: the array storing the actual characters and ends with “\0”.len
: the length of this SDS, it will exclude the trailing “\0”.alloc
the memory allocated for this SDS, but may be used partially. The available size isalloc - len
. This variable also exclude the last “\0”.flags
: 3 bits ofchar
indicates the type of this SDS.
Then you may ask how does these structures save memory? Firstly, len
and alloc
are int
types with different size. Secondly, __attribute__ ((__packed__))
tells the compiler do not add padding between fields of the struct. Redis also takes addressing into considering and implements its own zmalloc
rather than malloc
.
When someone tries to instantiate a new SDS, a inline function sdsReqTyp
will be called to select the corresponding type on account of its length.
static inline char sdsReqType(size_t string_size) {
if (string_size < 1<<5)
return SDS_TYPE_5; // less than 2^5
if (string_size < 1<<8)
return SDS_TYPE_8; // less than 2^8
if (string_size < 1<<16)
return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
if (string_size < 1ll<<32)
return SDS_TYPE_32;
return SDS_TYPE_64;
#else
return SDS_TYPE_32;
#endif
}
An Example
Assume a SDS with the content of “Redis”, here is the illustration.
Another case is a SDS storing “Redis” with redundant space.
Benefits Comparing with C String
What’s the benefits of SDS comparing with C string?
- Get length with O(1) complexity.
- Avoid buffer overflow.
- Reduce the frequency of reallocating memory. It will trigger reallocation if you
strcat
ofstrtrim
a C string. - Binary safety. “\0” can be in the middle of a SDS, not in C string. Because C string use “\0” to determine the length, while SDS has the length in the header.
Two useful macros must be explained because they are used widely.
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
They can return a header pointer to SDS and ##
is a connector. For example, SDS_HDR(8, s)
will be expanded to
((struct sdshdr8 *)((s)-(sizeof(struct sdshdr8))))
The tricky point is the type of sds
is char*
meaning it points to a char array. The advantage is enjoying the built-in functions of C String.
Init a SDS
A new SDS can be created with specified init
pointer and initLe
. The string is always null-terminated, even you create a sds string with s = sdsnewlen("abc", 3)
.
In order to init a SDS:
- Decide its type according to the length
- Allocate memory with built-in
zmalloc
- Init the header of SDS
- Memory copy
sds sdsnewlen(const void *init, size_t initlen) {
void *sh;
sds s;
char type = sdsReqType(initlen);
/* Empty strings are usually created in order to append. Use type 8
* since type 5 is not good at this. */
if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
int hdrlen = sdsHdrSize(type);
unsigned char *fp; /* flags pointer. */
sh = s_malloc(hdrlen+initlen+1);
if (init==SDS_NOINIT)
init = NULL;
else if (!init)
memset(sh, 0, hdrlen+initlen+1);
if (sh == NULL) return NULL;
s = (char*)sh+hdrlen;
fp = ((unsigned char*)s)-1;
switch(type) {
case SDS_TYPE_5: {
*fp = type | (initlen << SDS_TYPE_BITS);
break;
}
case SDS_TYPE_8: {
SDS_HDR_VAR(8,s);
sh->len = initlen;
sh->alloc = initlen;
*fp = type;
break;
}
case SDS_TYPE_16: {
SDS_HDR_VAR(16,s);
sh->len = initlen;
sh->alloc = initlen;
*fp = type;
break;
}
case SDS_TYPE_32: {
SDS_HDR_VAR(32,s);
sh->len = initlen;
sh->alloc = initlen;
*fp = type;
break;
}
case SDS_TYPE_64: {
SDS_HDR_VAR(64,s);
sh->len = initlen;
sh->alloc = initlen;
*fp = type;
break;
}
}
if (initlen && init)
memcpy(s, init, initlen);
s[initlen] = '\0';
return s;
}
Get Length
With a variable len
, Redis can get the length of SDS with time complexity of O(1) which C string must visit all characters until the EOL.
static inline size_t sdslen(const sds s) {
unsigned char flags = s[-1];
switch(flags&SDS_TYPE_MASK) {
case SDS_TYPE_5:
return SDS_TYPE_5_LEN(flags);
case SDS_TYPE_8:
return SDS_HDR(8,s)->len;
case SDS_TYPE_16:
return SDS_HDR(16,s)->len;
case SDS_TYPE_32:
return SDS_HDR(32,s)->len;
case SDS_TYPE_64:
return SDS_HDR(64,s)->len;
}
return 0;
}
Avoid Buffer Overflow
For C string, it may cause buffer overflow if you forget allocate enough space for the string when you update it. SDS can avoid this situation because it will check the available memory before modification.
Take sdscat
for example, it will append the specified null terminated C string t
to the SDS string s
.
sds sdscat(sds s, const char *t) {
return sdscatlen(s, t, strlen(t));
}
sds sdscatlen(sds s, const void *t, size_t len) {
size_t curlen = sdslen(s);
s = sdsMakeRoomFor(s,len);
if (s == NULL) return NULL;
memcpy(s+curlen, t, len);
sdssetlen(s, curlen+len);
s[curlen+len] = '\0';
return s;
}
sdsMakeRoomFor
enlarges the free space at the end of the SDS so that the caller is sure that it can overwrite up to the new SDS after this function.
sds sdsMakeRoomFor(sds s, size_t addlen) {
void *sh, *newsh;
size_t avail = sdsavail(s);
size_t len, newlen;
char type, oldtype = s[-1] & SDS_TYPE_MASK;
int hdrlen;
/* Return ASAP if there is enough space left. */
if (avail >= addlen) return s;
len = sdslen(s);
sh = (char*)s-sdsHdrSize(oldtype);
newlen = (len+addlen);
if (newlen < SDS_MAX_PREALLOC)
newlen *= 2;
else
newlen += SDS_MAX_PREALLOC;
type = sdsReqType(newlen);
/* Don't use type 5: the user is appending to the string and type 5 is
* not able to remember empty space, so sdsMakeRoomFor() must be called
* at every appending operation. */
if (type == SDS_TYPE_5) type = SDS_TYPE_8;
hdrlen = sdsHdrSize(type);
if (oldtype==type) {
newsh = s_realloc(sh, hdrlen+newlen+1);
if (newsh == NULL) return NULL;
s = (char*)newsh+hdrlen;
} else {
/* Since the header size changes, need to move the string forward,
* and can't use realloc */
newsh = s_malloc(hdrlen+newlen+1);
if (newsh == NULL) return NULL;
memcpy((char*)newsh+hdrlen, s, len+1);
s_free(sh);
s = (char*)newsh+hdrlen;
s[-1] = type;
sdssetlen(s, len);
}
sdssetalloc(s, newlen);
return s;
}
Steps:
- Return ASAP if available memory is enough
- Reallocate memory
- Change header size if needed
- Set new length
Reallocation
How does SDS reduce the need for reallocation? The first solution is pre-allocation in sdsMakeRoomFor
.
newlen = (len+addlen);
if (newlen < SDS_MAX_PREALLOC)
newlen *= 2;
else
newlen += SDS_MAX_PREALLOC;
If the new length is smaller than 1MB, it will multiply by 2, or plus by 1MB.
Back to the SDS instance s
with value of “Redis”, append " Cluster" to it by calling sdscat(s, " Cluster")
. the new alloc
will be (5+8)*2 = 26
.
Then it will not reallocate more memory if we append " Tutorial" again.
The second solution is lazy release. If the SDS length shrinks, for example sdstrim
, only the len
variable is changed rather than deallocate the superfluous space which can be released by explicitly calling sdsRemoveFreeSpace
. But after sdsRemoveFreeSpace
, the next concatenation operations will require reallocation.
sds sdsRemoveFreeSpace(sds s) {
void *sh, *newsh;
char type, oldtype = s[-1] & SDS_TYPE_MASK;
int hdrlen, oldhdrlen = sdsHdrSize(oldtype);
size_t len = sdslen(s);
sh = (char*)s-oldhdrlen;
/* Check what would be the minimum SDS header that is just good enough to
* fit this string. */
type = sdsReqType(len);
hdrlen = sdsHdrSize(type);
/* If the type is the same, or at least a large enough type is still
* required, we just realloc(), letting the allocator to do the copy
* only if really needed. Otherwise if the change is huge, we manually
* reallocate the string to use the different header type. */
if (oldtype==type || type > SDS_TYPE_8) {
newsh = s_realloc(sh, oldhdrlen+len+1);
if (newsh == NULL) return NULL;
s = (char*)newsh+oldhdrlen;
} else {
newsh = s_malloc(hdrlen+len+1);
if (newsh == NULL) return NULL;
memcpy((char*)newsh+hdrlen, s, len+1);
s_free(sh);
s = (char*)newsh+hdrlen;
s[-1] = type;
sdssetlen(s, len);
}
sdssetalloc(s, len);
return s;
}
Reference