C string is used only for literal constant in Redis. For the situation that the string can be modified, Redis provide a Simple Dynamic String(SDS). The key in Redis is actually a SDS, and if you insert a key-value with SET msg "hello world", a SDS instance will be constructed for the key. In this case, the value is also a SDS instance with the content of “hello world”.

Besides string, SDS is also used for buffer which can serve the AOF module and client’s input.

This post will dive into the implementation of SDS comparing with classic C string.

The code analyzed in this post is based on 5.0.0

Definition

SDS is defined in sds.h. The struct changed after 3.2 to save memory. Now, Redis offers five types of SDS: sdshdr5, sdshdr8, sdshdr16, sdshdr32, sdshdr64. The length of content determines which struct will be used.

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3

typedef char *sds;

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

Allow me to skip sdshr5 which will never be used. The other four structures all incorporate the following member:

  • buf: the array storing the actual characters and ends with “\0”.
  • len: the length of this SDS, it will exclude the trailing “\0”.
  • alloc the memory allocated for this SDS, but may be used partially. The available size is alloc - len. This variable also exclude the last “\0”.
  • flags: 3 bits of char indicates the type of this SDS.

Then you may ask how does these structures save memory? Firstly, len and alloc are int types with different size. Secondly, __attribute__ ((__packed__)) tells the compiler do not add padding between fields of the struct. Redis also takes addressing into considering and implements its own zmalloc rather than malloc.

When someone tries to instantiate a new SDS, a inline function sdsReqTyp will be called to select the corresponding type on account of its length.

static inline char sdsReqType(size_t string_size) {
    if (string_size < 1<<5)
        return SDS_TYPE_5; // less than 2^5
    if (string_size < 1<<8)
        return SDS_TYPE_8; // less than 2^8
    if (string_size < 1<<16)
        return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32)
        return SDS_TYPE_32;
    return SDS_TYPE_64;
#else
    return SDS_TYPE_32;
#endif
}

An Example

Assume a SDS with the content of “Redis”, here is the illustration.

Another case is a SDS storing “Redis” with redundant space.

Benefits Comparing with C String

What’s the benefits of SDS comparing with C string?

  • Get length with O(1) complexity.
  • Avoid buffer overflow.
  • Reduce the frequency of reallocating memory. It will trigger reallocation if you strcat of strtrim a C string.
  • Binary safety. “\0” can be in the middle of a SDS, not in C string. Because C string use “\0” to determine the length, while SDS has the length in the header.

Two useful macros must be explained because they are used widely.

#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))

They can return a header pointer to SDS and ## is a connector. For example, SDS_HDR(8, s) will be expanded to

((struct sdshdr8 *)((s)-(sizeof(struct sdshdr8))))

The tricky point is the type of sds is char* meaning it points to a char array. The advantage is enjoying the built-in functions of C String.

Init a SDS

A new SDS can be created with specified init pointer and initLe. The string is always null-terminated, even you create a sds string with s = sdsnewlen("abc", 3).

In order to init a SDS:

  1. Decide its type according to the length
  2. Allocate memory with built-in zmalloc
  3. Init the header of SDS
  4. Memory copy
sds sdsnewlen(const void *init, size_t initlen) {
    void *sh;
    sds s;
    char type = sdsReqType(initlen);
    /* Empty strings are usually created in order to append. Use type 8
     * since type 5 is not good at this. */
    if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    int hdrlen = sdsHdrSize(type);
    unsigned char *fp; /* flags pointer. */

    sh = s_malloc(hdrlen+initlen+1);
    if (init==SDS_NOINIT)
        init = NULL;
    else if (!init)
        memset(sh, 0, hdrlen+initlen+1);
    if (sh == NULL) return NULL;
    s = (char*)sh+hdrlen;
    fp = ((unsigned char*)s)-1;
    switch(type) {
        case SDS_TYPE_5: {
            *fp = type | (initlen << SDS_TYPE_BITS);
            break;
        }
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_16: {
            SDS_HDR_VAR(16,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_32: {
            SDS_HDR_VAR(32,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_64: {
            SDS_HDR_VAR(64,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
    }
    if (initlen && init)
        memcpy(s, init, initlen);
    s[initlen] = '\0';
    return s;
}

Get Length

With a variable len, Redis can get the length of SDS with time complexity of O(1) which C string must visit all characters until the EOL.

static inline size_t sdslen(const sds s) {
    unsigned char flags = s[-1];
    switch(flags&SDS_TYPE_MASK) {
        case SDS_TYPE_5:
            return SDS_TYPE_5_LEN(flags);
        case SDS_TYPE_8:
            return SDS_HDR(8,s)->len;
        case SDS_TYPE_16:
            return SDS_HDR(16,s)->len;
        case SDS_TYPE_32:
            return SDS_HDR(32,s)->len;
        case SDS_TYPE_64:
            return SDS_HDR(64,s)->len;
    }
    return 0;
}

Avoid Buffer Overflow

For C string, it may cause buffer overflow if you forget allocate enough space for the string when you update it. SDS can avoid this situation because it will check the available memory before modification.

Take sdscat for example, it will append the specified null terminated C string t to the SDS string s.

sds sdscat(sds s, const char *t) {
    return sdscatlen(s, t, strlen(t));
}

sds sdscatlen(sds s, const void *t, size_t len) {
    size_t curlen = sdslen(s);

    s = sdsMakeRoomFor(s,len);
    if (s == NULL) return NULL;
    memcpy(s+curlen, t, len);
    sdssetlen(s, curlen+len);
    s[curlen+len] = '\0';
    return s;
}

sdsMakeRoomFor enlarges the free space at the end of the SDS so that the caller is sure that it can overwrite up to the new SDS after this function.

sds sdsMakeRoomFor(sds s, size_t addlen) {
    void *sh, *newsh;
    size_t avail = sdsavail(s);
    size_t len, newlen;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen;

    /* Return ASAP if there is enough space left. */
    if (avail >= addlen) return s;

    len = sdslen(s);
    sh = (char*)s-sdsHdrSize(oldtype);
    newlen = (len+addlen);
    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;

    type = sdsReqType(newlen);

    /* Don't use type 5: the user is appending to the string and type 5 is
     * not able to remember empty space, so sdsMakeRoomFor() must be called
     * at every appending operation. */
    if (type == SDS_TYPE_5) type = SDS_TYPE_8;

    hdrlen = sdsHdrSize(type);
    if (oldtype==type) {
        newsh = s_realloc(sh, hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+hdrlen;
    } else {
        /* Since the header size changes, need to move the string forward,
         * and can't use realloc */
        newsh = s_malloc(hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    sdssetalloc(s, newlen);
    return s;
}

Steps:

  1. Return ASAP if available memory is enough
  2. Reallocate memory
  3. Change header size if needed
  4. Set new length

Reallocation

How does SDS reduce the need for reallocation? The first solution is pre-allocation in sdsMakeRoomFor.

newlen = (len+addlen);
if (newlen < SDS_MAX_PREALLOC)
    newlen *= 2;
else
    newlen += SDS_MAX_PREALLOC;

If the new length is smaller than 1MB, it will multiply by 2, or plus by 1MB.

Back to the SDS instance s with value of “Redis”, append " Cluster" to it by calling sdscat(s, " Cluster"). the new alloc will be (5+8)*2 = 26.

Then it will not reallocate more memory if we append " Tutorial" again.

The second solution is lazy release. If the SDS length shrinks, for example sdstrim, only the len variable is changed rather than deallocate the superfluous space which can be released by explicitly calling sdsRemoveFreeSpace. But after sdsRemoveFreeSpace, the next concatenation operations will require reallocation.

sds sdsRemoveFreeSpace(sds s) {
    void *sh, *newsh;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen, oldhdrlen = sdsHdrSize(oldtype);
    size_t len = sdslen(s);
    sh = (char*)s-oldhdrlen;

    /* Check what would be the minimum SDS header that is just good enough to
     * fit this string. */
    type = sdsReqType(len);
    hdrlen = sdsHdrSize(type);

    /* If the type is the same, or at least a large enough type is still
     * required, we just realloc(), letting the allocator to do the copy
     * only if really needed. Otherwise if the change is huge, we manually
     * reallocate the string to use the different header type. */
    if (oldtype==type || type > SDS_TYPE_8) {
        newsh = s_realloc(sh, oldhdrlen+len+1);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+oldhdrlen;
    } else {
        newsh = s_malloc(hdrlen+len+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    sdssetalloc(s, len);
    return s;
}

Reference