0%

COMP9315 DBMS Implementation

Since i have no course to choose, so i have to choose this one. On the other hand, this course can help me to understand better about database. Although this course is called Database Implementation, it does not teach you how to write a database application. Instead, the course is full of various algorithms, taking Postgre as example.

The quiz almost does not change every year. It aims to help you check whether you grasp the knowledge taught on lectures.

Assessment 1

For assessment 1, it needs you to implement a new type called gcoord. You can get more details on https://cgi.cse.unsw.edu.au/~cs9315/23T1/assignment/1/index.php.

Note 1:

Because C Postgre library has special processing on struct storage. You should follow the document standard.

1
2
3
4
5
6
7
8
typedef struct GeoCoord
{
double latitude_value;
double longitude_value;
char latitude_direction;
char longitude_direction;
char location_name[FLEXIBLE_ARRAY_MEMBER];
} GeoCoord;

As shown above, variable-length variable must be declared at the end of the struct.

You can get more details on https://www.postgresql.org/docs/current/xfunc-c.html.

Even though you don’t pay attention to this special usage, malloc can also helps you pass lots of tests provided by the convener.

Note 2:

This problem is about precision.

For function Convert2DMS:

1
2
3
4
It converts the greographical coordinates from decimal to DMS(degree, minute, second). The calculation can be described as:
D = floor(Ddec),
M = floor(60 × |Ddec-D|),
S = floor(3600 × |Ddec-D| - 60 × M),

Because latitude_value in my struct is double and double will cause precision loss when executing operation like:

1
2
3
4
5
6
int main() {
double x = 73.55;
int result = (int)((x - 73) * 60);
printf("result: %d", result); // result: 32
return 0;
}

The result should be 33 but program gives us 32.

Since the coordinate value do not have more than 4 decimal places, so we can solve it as shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
double myround(double x) {
double result = (x-round(x));
if (fabs(result) < 0.0001) {
return round(x);
} else {
return x;
}
}

void DMS(char *string, double value, char direction) {
int d_dec, degree, minute, second;
double diff;
degree = (int) floor(value);
diff = (value - degree);
minute = (int) floor(myround(60*diff));
second = (int) floor(myround(3600*diff - 60 * minute));
if (minute == 0) {
sprintf(string, "%d°%c", degree, direction);
} else if (second == 0) {
sprintf(string, "%d°%d'%c", degree, minute, direction);
} else {
sprintf(string, "%d°%d'%d\"%c", degree, minute, second, direction);
}
}

But my friend has better idea:

1
2
3
4
5
6
typedef struct GeoCoord {
int latitude;
int longitude;
}
latitude = (int)((x * 10000) + 0.5);
longitude = (int)((y * 10000) + 0.5);

Note 3:

Because latitude and longitude in my struct are double and double is 64 bit. So we need 64-bit hash.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
static uint32_t
hash_uint64(uint64_t key)
{
key = (~key) + (key << 21);
key = key ^ (key >> 24);
key = key + (key << 3) + (key << 8);
key = key ^ (key >> 14);
key = key + (key << 2) + (key << 4);
key = key ^ (key >> 28);
key = key + (key << 31);
return (uint32_t)key;
}


PG_FUNCTION_INFO_V1(gcoord_hash);
Datum
gcoord_hash(PG_FUNCTION_ARGS)
{
GeoCoord* coord = (GeoCoord*)PG_GETARG_POINTER(0);
uint32_t hash = 0;
hash ^= (uint32_t) strlen(coord->location_name);
hash ^= hash_uint64(coord->latitude_value);
hash ^= hash_uint64(coord->longitude_value);
hash ^= (uint32_t) coord->latitude_direction;
PG_RETURN_UINT32(hash);
}

I think it is too tricky and there should be a better solution.

Assessment 2

The second ass needs you to implement operation Sel and Join. You can find more details on https://cgi.cse.unsw.edu.au/~cs9315/23T1/assignment/2/.

Note1: read page from file

You need to read page from file and for each page, you read tuple from it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (int page_idx = 0; page_idx < total_page; page_idx++) {
buffer* buffer_p = request_page(t, page_idx);

for (int tid = 0; tid < get_tuples_num(buffer_p); tid++) {
Tuple tuple = get_tuple_by_tuple_id(buffer_p, tid);
if (tuple[idx] == cond_val) {
result->tuples[counter++] = tuple;
} else {
free(tuple);
}
}
release_page(buffer_p);
log_release_page(buffer_p->page_id);
}

The thought seems simple but i didn’t come out at the beginning. I read all pages from files and then read tuples from pages.

Note2: buffer pool

Take Sel as example. We check whether current page is stored in buffer pool and if it has been stored, then just read it from pool. Otherwise, we read page from file by page index and then store the page in pool. When storing page in pool, we check whether the number of pages is greater than the buffer limit. If it is greater than buffer limit, then we use buffer replacement to replace page. Otherwise, we directly store the page in buffer pool.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
buffer* request_page(Table* t, int page_index) {
buffer* buffers = buffer_pool->buffers;
int slot = page_in_pool(t->oid, page_index);
UINT buffer_size = buffer_pool->nbufs;
UINT* nvb_p = &buffer_pool->nvb;

if (slot >= 0) {
buffers[slot].usage++;
buffers[slot].pin = 1;
buffer_pool->nhits++;
return &buffers[slot];
}

while (1) {
if (buffers[*nvb_p].pin == 0 && buffers[*nvb_p].usage == 0) {
write_page_to_buffer_pool(t, *nvb_p, page_index);
UINT tmp = *nvb_p;
*nvb_p = (*nvb_p + 1) % buffer_size;
return &buffers[tmp];
} else {
if (buffers[*nvb_p].usage > 0) buffers[*nvb_p].usage--;
*nvb_p = (*nvb_p + 1) % buffer_size;
}
}
}

void write_page_to_buffer_pool(Table* t, UINT slot, UINT page_index) {
FileInfo* file_info = open_file_by_table_name(t->name);
FILE* fp = file_info->file;

UINT page_size = get_conf()->page_size;
fseek(fp, page_index * page_size, SEEK_SET);
fread(buffer_pool->buffers[slot].page, page_size, 1, fp);
log_read_page(get_page_id(buffer_pool->buffers[slot].page));

buffer_pool->nreads++;
UINT64 page_id;
memcpy(&page_id, buffer_pool->buffers[slot].page, sizeof(UINT64));
sprintf(buffer_pool->buffers[slot].id, "%u-%lu", t->oid, page_id);
buffer_pool->buffers[slot].oid = t->oid;
memcpy(buffer_pool->buffers[slot].table, t, sizeof(Table));
buffer_pool->buffers[slot].page_index = page_index;
buffer_pool->buffers[slot].page_id = page_id;
buffer_pool->buffers[slot].usage = 1;
buffer_pool->buffers[slot].pin = 1;

release_file(file_info);
}

For request_page, it implements the clock-sweep strategy which is the Postgre default buffer replacement policy.

Note3: file pool

Because we have the limit for the number of opened files. So we need to maintain a file pool.

If we want to read or write a page from file, we firstly check whether the file pointer has been stored in file pool by oid. If it has been stored, then we use it directly. Otherwise, we check whether the num of opened files is greater than the file limit. If it is greater than file limit, we use buffer replacement strategy the same as buffer pool. If not, we open a new file and store related oid in file pool.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
FileInfo* open_file_by_table_name(const char* table_name) {
Database* db = get_db();
Table* t = get_table(table_name);
char* table_path = (char*) malloc(sizeof(db->path) + 5);
sprintf(table_path, "%s/%u", db->path, t->oid);
FileInfo* result = request_file(table_path, "r", t->oid);
free(table_path);
free(t);
return result;
}

FileInfo* request_file(const char* filename, const char* mode, UINT oid) {
FileInfo* buffers = file_pool->buffers;
int slot = file_in_pool(oid);
UINT buffer_size = file_pool->nbufs;
UINT* nvb_p = &file_pool->nvb;

if (slot >= 0) {
buffers[slot].usage++;
buffers[slot].pin = 1;
return &buffers[slot];
}

while (1) {
if (buffers[*nvb_p].pin == 0 && buffers[*nvb_p].usage == 0) {
open_file_in_pool(*nvb_p, filename, mode, oid);
UINT tmp = *nvb_p;
*nvb_p = (*nvb_p + 1) % buffer_size;
return &buffers[tmp];
} else {
if (buffers[*nvb_p].usage > 0) buffers[*nvb_p].usage--;
*nvb_p = (*nvb_p + 1) % buffer_size;
}
}
}

void open_file_in_pool(UINT slot, const char* filename, const char* mode, UINT oid) {
if (file_pool->num_opened_files >= file_pool->nbufs) {
close_file_in_pool(slot);
}

file_pool->buffers[slot].file = fopen(filename, mode);
log_open_file(oid);
file_pool->buffers[slot].oid = oid;
file_pool->buffers[slot].pin = 1;
file_pool->buffers[slot].usage = 1;
file_pool->num_opened_files++;
}

Note4: valgrind

The tool is used to check memory leak. It also can check error like:

1
2
3
4
char* result = (char*) malloc(tuple_size);
memcpy(result, curr, tuple_size);

Address 0x4a40299 is 0 bytes after a block of size 25 alloc'd

For the code above, it should be written as below because the last byte is for \0.

1
2
char* result = (char*) malloc(tuple_size+1);
memcpy(result, curr, tuple_size+1);

Final exam

3 programs are all about assessment2. It is not difficult but it is really hard to figure it out in 3 hours because we have another 5 questions to do.

Q1

Note1: convert big tuple to small tuple

Q2

I forgot to use r+. So funny…as a result, i write nothing back to the file.

Conclusion

The course has challenging assessment. Ass1 strengthens my ability to collect information from official documents. Ass2 makes me have better understanding of clock sweep, simple hash join and merge sort join. It also enhances the ability to maintain C pointer.