Low On Mac Disk Space? Disk Map Visual Analysis And Quick Cleanup

Have you played "Awakening of Kingdoms"? When tens of thousands of people are fighting on the same screen, the server must accurately calculate whether each grid on the map is grass or a mountain in nanoseconds. If there is a lag during this step, your troops will most likely hit the wall directly. Behind this process is actually a storage revolution surrounding 16 million grids of data.

Farewell to the heavy price of text

[
{"x": 0, "y": 0, "type": "grass", "isBlock":false},
{"x": 1, "y": 0, "type": "river", "isBlock":true},
...
]

Imagine a large map with a seamless pattern of 4000×4000 size, with a total of 16 million grids. If JSON is used to store it like a traditional web page, even if each grid only stores a numeric ID, plus curly brackets and commas, it will take up 50 bytes on average. 16 million multiplied by 50 bytes of MBTI free test , calculated to be 800MB. This is just a map, the server memory would have exploded long ago. The even more frightening thing is that the string must be parsed every time a query is made. This is a disaster in a game logic that is called tens of thousands of times per second. The conclusion is very clear: in high-performance game backends, text formats such as JSON must be eliminated.

binary flat design

We need to move away from text and move towards binary. What is the most critical attribute of each grid? That is the type of terrain, such as grassland, sand, and rivers, which directly determines whether the troops can pass. These types usually do not exceed the 256 MBTI free tests , so one byte (Byte) is enough. We can design an extremely streamlined .bin file that "flattens" the two-dimensional map matrix into a one-dimensional array. Store in row-major order, first store all the cells in the first row, and then store the second row. In this way, if you want to read the grid in the x-th row and y-th column, you can directly use the formula位置= x强相关于地图宽度再与y进行运算to achieve a one-step result.

File generation and memory mapping practice

Suppose we use Python to generate a map for testing. Then create an array with 16 million bytes, and simulate a meandering river in the middle of the array. Then write this array directly to the binary file, so you will get a file that is exactly 16MB. Compared to JSON's 800MB, it is compressed a full 50 times! When reading, we turn to the operating system's mmap (that is, memory mapping), which is a black technology. It only takes a few lines of Python code to map this 16MB file into the memory address space. From then on, operating on this mapped array is as easy as operating an ordinary Python byte string. However, the data behind it is directly managed by the operating system.

importstruct

defgenerate_map(width, height, filename):
print(f"Generating map {width}x{height} to {filename}...")

withopen(filename, 'wb')asf:
# 1. 写入 Header (Big-Endian)
# >II 表示: Big-Endian, Unsigned Int, Unsigned Int
f.write(struct.pack('>II', width, height))

# 2. 写入 Body
# 模拟生成:y 在中间区域为河流(ID=3),其他为草地(ID=0)
row_data = bytearray(width)

foryinrange(height):
# 简单的地形生成逻辑
ifheight // 2 - 50 <= y <= height // 2 + 50:
terrain_id = 3# River
else:
terrain_id = 0# Grass

# 填充整行数据
forxinrange(width):
row_data[x] = terrain_id

f.write(row_data)

print("Done.")

# 生成一张 4096 x 4096 的大地图
generate_map(4096, 4096, 'rok_map.bin')

The subtleties of single-byte design

importmmap
importstruct
importos

classTerrainManager:
def__init__(self, filename):
self.f = open(filename, 'rb')
# 建立内存映射
self.mm = mmap.mmap(self.f.fileno, 0, access=mmap.ACCESS_READ)

# 读取 Header
# unpack 返回的是元组,取第一个元素
self.width = struct.unpack('>I', self.mm[0:4])[0]
self.height = struct.unpack('>I', self.mm[4:8])[0]

self.data_offset = 8
print(f"Map loaded: {self.width}x{self.height}")

defget_terrain_id(self, x, y):
ifnot(0 <= x < self.widthand0 <= y < self.height):
return255# Out of bound

# O(1) 随机访问
index = self.data_offset + (y * self.width + x)

# 在 Python 中 mmap 切片返回 bytes,取 [0] 转为 int
returnself.mm[index]

defclose(self):
self.mm.close
self.f.close

# 测试查询
tm = TerrainManager('rok_map.bin')
print(f"Terrain at (0, 0): {tm.get_terrain_id(0, 0))}") # -> 0 (Grass)
print(f"Terrain at (2048, 2048): {tm.get_terrain_id(2048, 2048))}") # -> 3 (River)
tm.close

Why must we use 1 byte instead of the simpler 4-byte integer? At first, computer memory was addressed in terms of bytes. The 1-byte design promotes an extremely perfect linear relationship between grid coordinates and memory addresses. There is no redundant multiplication and division conversion when the CPU calculates the address, and the number of instructions is minimal. For functions like pathfinding algorithms (such as Algorithm A) that are called millions of times per second, the performance gains are extremely huge. Secondly, the memory page of the operating system is generally 4KB. Using 1 byte, each page can store an entire continuous map area. When accessing adjacent grids, the cache hit rate is extremely high. If 4 bytes are used, the map range that can be stored on one page will be reduced, which can easily cause frequent page fault interrupts.

mmap's zero copy and off-heap memory

When the game server is started, when mmap mapping the 16MB map file, why does it feel like it is completed in an instant? The reason is that it does not actually read the file at all! Behind this there are mechanisms such as "zero copy" and "missing page loading". mmap only builds the mapping relationship between the virtual memory address and the disk file, but the data is still on the disk. Only when your code actually reads a specific map grid will a page fault interrupt be triggered. In this case, the operating system will load a small page of data (4KB) covering that grid from the disk into physical memory. Moreover, these 16 megabytes of data are stored in the page cache of the operating system and do not occupy the heap memory of the game process. There is no pressure for garbage collection at all.

Cross-language standard solution

This solution can be perfectly reproduced in C/C++, Java, and Go. For SLG games like "Awakening of Nations ", the storage of super large maps is not a metaphysics at all, it is a standard solution on the server side. With an extremely compact size, light-speed startup and loading speed, constant-time indexing performance, and being completely unaffected by the garbage collection mechanism, this combination ensures that when thousands of people are fighting on the same screen, map query is no longer a performance bottleneck. This is also the reason why the mainstream SLG map editors in the world usually directly generate this binary format.

In the games you have played, have you ever encountered a moment when the map loading was stuck, causing the troops to "drift"? Share your experience in the comment area and like it so that more developers can see the real pain points of players. Also, don’t forget to bookmark this article and use it the next time you share technology.