In previous episode we learnt about memory offset. In this episode, we are going to reach two very important milestones.
YouTube Video coming soon
1 – Switching from Real Mode to Protected Mode and define Global Descriptor Table (GDT)
2 – Read kernel from disk and Initialize Kernel written in C language.
Let us have a quick look at what we are going to do:
After our boot sector loads, we will read kernel from disk and load it into memory, after that we will switch from real mode to protected mode, what is real mode? what is protected mode? dont worry read further more. Once we are in protected mode and define Global Descriptor Table (GDT), after that we will initialize our kernel which is written in C language, and from this point onward we will go away from assembly language. Most likely we will develop file system still in Assembly, but not sure. So let’s understand many things mentioned in this paragraph.
Real Mode and Protected Mode
Every x86 CPU (32-bit CPU) and even 64-bit processor boots into 16-bit mode for backward compatibility. 16-bit processors stayed with us for a long duration and many applications were developed on 16-bit processors, including some operating systems (Dos, Windows 3.11, etc…). When 32-bit CPUs came, companies wanted to make sure that 16-bit applications would still run on 32-bit CPU, so processor companies decided that 32-bit CPU will start in real mode (16-bit) and then program will need to send special instruction to start in protected (32-bit) mode. Similarly 64-bit CPUs still start in 16-bit mode and then they switch to Long Mode. Now, why do we need 32-bit or 64-bit? To make it simpler, in 16-bit application can directly access 64 KB of memory and with segmentation max up to 1 MB. So if we have a computer with more then 1 MB RAM, 16-bit CPU cannot use them. Similarly 32-bit CPU at max can access up to 4 GB Memory space.
Apart from more memory access, Protected (32-bit) Mode also gives a lot more benefit over Real Mode. Some of the benefits are:
Above are major benefits of Protected Mode, there are some more benefits. You can read more about Protected Mode on Wiki.
So now let’s load kernel from the disk and load it into specified memory address and then switch to a protected mode from real mode. One of the reason why we are going to switch to protected mode is our kernel. Our kernel is written in C, and C compiler which we are going to use will produce 32-bit binary files. So we do need to switch to protected mode to initialize our kernel.
So, let’s start coding.
org 0x7c00
KERNEL_OFFSET equ 0x1000 ;We will load our kernel at 0x1000
mov [BOOT_DRIVE], dl ;Store the drive number on which system has booted in BOOT_DRIVE variable.
mov bp, 0x9000
mov sp, bp
mov bx, MSG_REAL_MODE ;We will display 'Started in 16-bit real mode' message to the user.
call print ;print the message.
call print_nl ;add a new line.
jmp $ ;infinite loop
BOOT_DRIVE db 0
MSG_REAL_MODE db "Started in 16-bit real mode", 0
times 510-($-$$) db 0 ;fill up boot loader upto 512 bytes.
dw 0xaa55 ;boot loader signature
Let us understand above code.. Above code is basically defining sequence of our code. If you try to compile it will give errors, as code is not complete. I have provided comments in code which will explain what we are doing.
jmp $ ;infinite loop
print:
pusha
start:
mov al, [bx]
cmp al, 0
je done
mov ah, 0x0e
int 0x10
add bx, 1
jmp start
done:
popa
ret
print_nl:
pusha
mov ah, 0x0e
mov al, 0x0a ; newline char
int 0x10
mov al, 0x0d ; carriage return
int 0x10
popa
ret
Above code, we will write it just below line number 12 in boot.asm
Now, next we will read kernel from the disk and load it into BOOT_DRIVE (0x1000) memory offset.
call print
call print_nl
call load_kernel
call switch_to_pm
In boot.asm file, we will add line number 4 as shown above.. which is about loading kernel. Here we are just going to read kernel from disk and load it into memory. NOTE: We are just loading it into memory and not executing it.
print_nl:
pusha
mov ah, 0x0e
mov al, 0x0a ; newline char
int 0x10
mov al, 0x0d ; carriage return
int 0x10
popa
ret
load_kernel:
mov bx, MSG_LOAD_KERNEL
call print
call print_nl
mov bx, KERNEL_OFFSET ;read from disk and store in 0x1000
mov dh, 1 ;read only 1 sector from HDD or bootable disk
mov dl, [BOOT_DRIVE]
call disk_load
ret
Above code we will write it inside boot.asm file, after print_nl block. Also at the bottom of the code, just below our MSG_REAL_MODE message, write below code:
MSG_LOAD_KERNEL db "Loading kernel into memory", 0
Above two code blocks will first print the message on the screen, “Loading kernel into memory” and after that it will set certain values in registers so that when we read from disk, it will load it into a specified memory offset. In above code, you will notice we are using disk_load routine.. this routine will read from disk. Below is the routine for reading from disk. disk_load routine we will write it below load_kernel routine.
; load 'dh' sectors from drive 'dl' into ES:BX
disk_load:
pusha
; reading from disk requires setting specific values in all registers
; so we will overwrite our input parameters from 'dx'. Let's save it
; to the stack for later use.
push dx
mov ah, 0x02 ; ah <- int 0x13 function. 0x02 = 'read'
mov al, dh ; al <- number of sectors to read (0x01 .. 0x80)
mov cl, 0x02 ; cl <- sector (0x01 .. 0x11)
; 0x01 is our boot sector, 0x02 is the first 'available' sector
mov ch, 0x00 ; ch <- cylinder (0x0 .. 0x3FF, upper 2 bits in 'cl')
; dl <- drive number. Our caller sets it as a parameter and gets it from BIOS
; (0 = floppy, 1 = floppy2, 0x80 = hdd, 0x81 = hdd2)
mov dh, 0x00 ; dh <- head number (0x0 .. 0xF)
; [es:bx] <- pointer to buffer where the data will be stored
; caller sets it up for us, and it is actually the standard location for int 13h
int 0x13 ; BIOS interrupt
jc disk_error ; if error (stored in the carry bit)
pop dx
cmp al, dh ; BIOS also sets 'al' to the # of sectors read. Compare it.
jne sectors_error
popa
ret
disk_error:
mov bx, DISK_ERROR
call print
call print_nl
mov dh, ah ; ah = error code, dl = disk drive that dropped the error
call print_hex ; check out the code at http://stanislavs.org/helppc/int_13-1.html
jmp disk_loop
sectors_error:
mov bx, SECTORS_ERROR
call print
disk_loop:
jmp $
; receiving the data in 'dx'
; For the examples we'll assume that we're called with dx=0x1234
print_hex:
pusha
mov cx, 0 ; our index variable
; Strategy: get the last char of 'dx', then convert to ASCII
; Numeric ASCII values: '0' (ASCII 0x30) to '9' (0x39), so just add 0x30 to byte N.
; For alphabetic characters A-F: 'A' (ASCII 0x41) to 'F' (0x46) we'll add 0x40
; Then, move the ASCII byte to the correct position on the resulting string
hex_loop:
cmp cx, 4 ; loop 4 times
je end_hex
; 1. convert last char of 'dx' to ascii
mov ax, dx ; we will use 'ax' as our working register
and ax, 0x000f ; 0x1234 -> 0x0004 by masking first three to zeros
add al, 0x30 ; add 0x30 to N to convert it to ASCII "N"
cmp al, 0x39 ; if > 9, add extra 8 to represent 'A' to 'F'
jle step2
add al, 7 ; 'A' is ASCII 65 instead of 58, so 65-58=7
step2:
; 2. get the correct position of the string to place our ASCII char
; bx <- base address + string length - index of char
mov bx, HEX_OUT + 5 ; base + length
sub bx, cx ; our index variable
mov [bx], al ; copy the ASCII char on 'al' to the position pointed by 'bx'
ror dx, 4 ; 0x1234 -> 0x4123 -> 0x3412 -> 0x2341 -> 0x1234
; increment index and loop
add cx, 1
jmp hex_loop
end_hex:
; prepare the parameter and call the function
; remember that print receives parameters in 'bx'
mov bx, HEX_OUT
call print
popa
ret
HEX_OUT:
db '0x0000',0 ; reserve memory for our new string
We will also put two messages at the bottom of our code file:
DISK_ERROR db "Disk read error", 0
SECTORS_ERROR db "Incorrect number of sectors read", 0
Now comes the most important part, i.e. Switching to protected mode and printing on the screen. Write below line of code in boot.asm file after call load_kernel
call load_kernel
call switch_to_pm
jmp $ ;infinite loop
switch_to_pm routine will look like following:
switch_to_pm:
cli ; 1. disable interrupts
lgdt [gdt_descriptor] ; 2. load the GDT descriptor
mov eax, cr0
or eax, 0x1 ; 3. set 32-bit mode bit in cr0
mov cr0, eax
jmp CODE_SEG:init_pm ; 4. far jump by using a different segment
We are disabling interrupts, we will no longer be able to use interrupts once we switch to protected mode. New instruction you will see here is lgdt, here we are loading Interrupt Descriptor Table Register. But before we switch to protected mode, we need to define our GDT (Global Descriptor Table). We will define it with below code:
gdt_start:
dd 0x0 ;4 bytes
dd 0x0 ;4 bytes
gdt_code:
dw 0xffff ;segment length, bits 0-15
dw 0x0 ;segment base, bits 0-15
db 0x0 ;segment base, bits 16-23
db 10011010b ;flags (8 bits)
db 11001111b ;flags (4 bits) + segment length, bits 16-19
db 0x0 ;segment base, bits 24-31
gdt_data:
dw 0xffff
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
gdt_end:
gdt_descriptor:
dw gdt_end - gdt_start - 1 ;size (16-bit), always one less of its true size
dd gdt_start ;address (32-bit)
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start
To understand Protected Mode and GDT, read it here https://en.wikipedia.org/wiki/Protected_mode also I have taken lots of inspiration from existing git hub repository for basics you can read about GDT here as well – https://github.com/cfenollosa/os-tutorial/tree/master/09-32bit-gdt
Now that we are in protected mode. We need to print a message on screen from protected mode. But we can no longer use our previous code to print on the screen. Remember we need to disable interrupts to switch to protected mode. So, now in order to print on the screen, we will access video memory and print. To do this, we will write below code:
VIDEO_MEMORY equ 0xb8000
WHITE_ON_BLACK equ 0x0f ; the color byte for each character
print_string_pm:
pusha
mov edx, VIDEO_MEMORY
print_string_pm_loop:
mov al, [ebx] ; [ebx] is the address of our character
mov ah, WHITE_ON_BLACK
cmp al, 0 ; check if end of string
je print_string_pm_done
mov [edx], ax ; store character + attribute in video memory
add ebx, 1 ; next char
add edx, 2 ; next video memory position
jmp print_string_pm_loop
print_string_pm_done:
popa
ret
Also we need to put below line at bottom of the code.
MSG_PROT_MODE db "Loaded 32-bit protected mode", 0
This completes our boot.asm file code and now our entire boot.asm file looks as follows (Must re-structure / split into multiple files):
org 0x7c00
KERNEL_OFFSET equ 0x1000
mov [BOOT_DRIVE], dl
mov bp, 0x9000
mov sp, bp
mov bx, MSG_REAL_MODE
call print
call print_nl
call load_kernel
call switch_to_pm
jmp $ ;infinite loop
print:
pusha
start:
mov al, [bx]
cmp al, 0
je done
mov ah, 0x0e
int 0x10
add bx, 1
jmp start
done:
popa
ret
print_nl:
pusha
mov ah, 0x0e
mov al, 0x0a ; newline char
int 0x10
mov al, 0x0d ; carriage return
int 0x10
popa
ret
load_kernel:
mov bx, MSG_LOAD_KERNEL
call print
call print_nl
mov bx, KERNEL_OFFSET ;read from disk and store in 0x1000
mov dh, 1 ;read only 1 sector from HDD or bootable disk
mov dl, [BOOT_DRIVE]
call disk_load
ret
; load 'dh' sectors from drive 'dl' into ES:BX
disk_load:
pusha
; reading from disk requires setting specific values in all registers
; so we will overwrite our input parameters from 'dx'. Let's save it
; to the stack for later use.
push dx
mov ah, 0x02 ; ah <- int 0x13 function. 0x02 = 'read'
mov al, dh ; al <- number of sectors to read (0x01 .. 0x80)
mov cl, 0x02 ; cl <- sector (0x01 .. 0x11)
; 0x01 is our boot sector, 0x02 is the first 'available' sector
mov ch, 0x00 ; ch <- cylinder (0x0 .. 0x3FF, upper 2 bits in 'cl')
; dl <- drive number. Our caller sets it as a parameter and gets it from BIOS
; (0 = floppy, 1 = floppy2, 0x80 = hdd, 0x81 = hdd2)
mov dh, 0x00 ; dh <- head number (0x0 .. 0xF)
; [es:bx] <- pointer to buffer where the data will be stored
; caller sets it up for us, and it is actually the standard location for int 13h
int 0x13 ; BIOS interrupt
jc disk_error ; if error (stored in the carry bit)
pop dx
cmp al, dh ; BIOS also sets 'al' to the # of sectors read. Compare it.
jne sectors_error
popa
ret
disk_error:
mov bx, DISK_ERROR
call print
call print_nl
mov dh, ah ; ah = error code, dl = disk drive that dropped the error
call print_hex ; check out the code at http://stanislavs.org/helppc/int_13-1.html
jmp disk_loop
sectors_error:
mov bx, SECTORS_ERROR
call print
disk_loop:
jmp $
; receiving the data in 'dx'
; For the examples we'll assume that we're called with dx=0x1234
print_hex:
pusha
mov cx, 0 ; our index variable
; Strategy: get the last char of 'dx', then convert to ASCII
; Numeric ASCII values: '0' (ASCII 0x30) to '9' (0x39), so just add 0x30 to byte N.
; For alphabetic characters A-F: 'A' (ASCII 0x41) to 'F' (0x46) we'll add 0x40
; Then, move the ASCII byte to the correct position on the resulting string
hex_loop:
cmp cx, 4 ; loop 4 times
je end_hex
; 1. convert last char of 'dx' to ascii
mov ax, dx ; we will use 'ax' as our working register
and ax, 0x000f ; 0x1234 -> 0x0004 by masking first three to zeros
add al, 0x30 ; add 0x30 to N to convert it to ASCII "N"
cmp al, 0x39 ; if > 9, add extra 8 to represent 'A' to 'F'
jle step2
add al, 7 ; 'A' is ASCII 65 instead of 58, so 65-58=7
step2:
; 2. get the correct position of the string to place our ASCII char
; bx <- base address + string length - index of char
mov bx, HEX_OUT + 5 ; base + length
sub bx, cx ; our index variable
mov [bx], al ; copy the ASCII char on 'al' to the position pointed by 'bx'
ror dx, 4 ; 0x1234 -> 0x4123 -> 0x3412 -> 0x2341 -> 0x1234
; increment index and loop
add cx, 1
jmp hex_loop
end_hex:
; prepare the parameter and call the function
; remember that print receives parameters in 'bx'
mov bx, HEX_OUT
call print
popa
ret
HEX_OUT:
db '0x0000',0 ; reserve memory for our new string
gdt_start:
dd 0x0 ;4 bytes
dd 0x0 ;4 bytes
gdt_code:
dw 0xffff ;segment length, bits 0-15
dw 0x0 ;segment base, bits 0-15
db 0x0 ;segment base, bits 16-23
db 10011010b ;flags (8 bits)
db 11001111b ;flags (4 bits) + segment length, bits 16-19
db 0x0 ;segment base, bits 24-31
gdt_data:
dw 0xffff
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
gdt_end:
gdt_descriptor:
dw gdt_end - gdt_start - 1 ;size (16-bit), always one less of its true size
dd gdt_start ;address (32-bit)
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start
switch_to_pm:
cli ; 1. disable interrupts
lgdt [gdt_descriptor] ; 2. load the GDT descriptor
mov eax, cr0
or eax, 0x1 ; 3. set 32-bit mode bit in cr0
mov cr0, eax
jmp CODE_SEG:init_pm ; 4. far jump by using a different segment
use32
init_pm:
mov ax, DATA_SEG
mov ds, ax
mov ss, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ebp, 0x90000
mov esp, ebp
call BEGIN_PM
BEGIN_PM:
mov ebx, MSG_PROT_MODE
call print_string_pm
call KERNEL_OFFSET
jmp $
VIDEO_MEMORY equ 0xb8000
WHITE_ON_BLACK equ 0x0f ; the color byte for each character
print_string_pm:
pusha
mov edx, VIDEO_MEMORY
print_string_pm_loop:
mov al, [ebx] ; [ebx] is the address of our character
mov ah, WHITE_ON_BLACK
cmp al, 0 ; check if end of string
je print_string_pm_done
mov [edx], ax ; store character + attribute in video memory
add ebx, 1 ; next char
add edx, 2 ; next video memory position
jmp print_string_pm_loop
print_string_pm_done:
popa
ret
BOOT_DRIVE db 0
MSG_REAL_MODE db "Started in 16-bit real mode", 0
MSG_PROT_MODE db "Loaded 32-bit protected mode", 0
MSG_LOAD_KERNEL db "Loading kernel into memory", 0
DISK_ERROR db "Disk read error", 0
SECTORS_ERROR db "Incorrect number of sectors read", 0
times 510-($-$$) db 0
dw 0xaa55
Now, that we are into protected mode, we will load our kernel. We are developing our kernel in 32-bit and in C language. In above code if you notice inside BEGIN_PM routine, we are calling KERNEL_OFFSET, means we are jumping to 0x1000. This will execute our kernel.
Create a file called kernel.c and write below code:
void main() {
char* video_memory = (char*) 0xb8000;
*video_memory = 'X';
}
So that is going to be our kernel as of now. Tiny little kernel which will print X on the screen.
We will also require to create a middle man, who will take control from boot laoder, and pass it to kernel. For this we will create a kernel loader. Let’s create a file called loader.asm and write below code:
format ELF ;instruct assembler to produce ELF (Executable and Linkable Format) file.
extrn main ;tell assembler that main is the external function so ignore the assembler / compiler if main is not found in code.
public _start
_start:
call main ;call external main function.
jmp $
Now, using assembler and c compiler we are going to compile our code and link them. So that binaries can call kernel correctly. For this we need to use compiler which can produce ELF files (Cross Compiler). For this we are going to use GCC, I am doing all of this on Windows, so I am going to use WSL for the same. You can go through my previous blog post to install and setup WSL as well as cross-compiler.
To compile above code, we are going to execute following commands by creating a bat file called compile.bat:
echo off
echo "clean all binaries"
del *.bin
del *.o
del *.elf
echo "compile boot.asm"
fasm boot.asm
echo "compile loader.asm"
fasm loader.asm
echo "compile kernel.c"
wsl gcc -m32 -ffreestanding -c kernel.c -o kernel.o
wsl objcopy kernel.o -O elf32-i386 kernel.elf
wsl /usr/local/i386elfgcc/bin/i386-elf-ld -o kernel.bin -Ttext 0x1000 loader.o kernel.elf --oformat binary
type boot.bin kernel.bin > os_image.bin
qemu-system-x86_64 os_image.bin
Now, execute compile.bat file from command line and you will be able to see output as following:
This will put a big smile on our face. We have achieved a great milestone of booting, reading kernel from disk, switch to protected mode, and execute the kernel.
In next chapter, we will create a video driver entirely in C language, so we can show blinking cursor, clear the screen and able to print messages on the screen by using functions like printf, etc… we will create our own printf function. Yes exciting.. so stay tuned and wait for a next blog entry, which will come up very soon.
You can access the LearnOS code repository at: https://github.com/dhavalhirdhav/LearnOS
Comments are closed.
great tutorial! what is the analogous for these commands on linux?
wsl gcc -m32 -ffreestanding -c kernel.c -o kernel.o
wsl objcopy kernel.o -O elf32-i386 kernel.elf
wsl /usr/local/i386elfgcc/bin/i386-elf-ld -o kernel.bin -Ttext 0x1000 loader.o kernel.elf –oformat binary type boot.bin kernel.bin > os_image.bin
remove wsl from the start of the command and it will work on linux. 🙂