4. data type parsing functions
If we have for example:
TestStructName value[5][3] = {{...},{...}};// orchar** c = ...;//ortypedef enum { MODE_A, MODE_B = 5, MODE_C } Mode;We need a pretty smart code, that will be able to parse those patterns.
Basic Datatype Parsing
Section titled “Basic Datatype Parsing”char c;Datatype Enum
Section titled “Datatype Enum”We need to implement an enum that will be holding parsed data, let’s name it ‘DataType’. It has to be an enum, because it will be holding different data types, like: arrays, pointers, enums, etc.
But for now, only give it these types:
- Data-> name and a bool saying whether it is unsigned.
- Pointer-> Box< DataType >
⚠️ Implementation
#[derive(Debug, Clone)]pub enum DataType { Data { name: String, unsigned: bool }, Pointer(Box<DataType>),}Parse Function
Section titled “Parse Function”Declaration
Section titled “Declaration”pub fn parse(parser: &mut Parser) -> Result<DataType>{...}Function
Section titled “Function”- Parse current tokens, so that it outputs appropriate ‘DataType’.
- Remember to check, for
TokenKind::Unsignedat the beginning. For example:unsigned int uint = 16; - Select the right function depending on the kind of the token it encounters
next:
- identifier -> identifier_type();
- enum -> enum_type();
- struct -> struct_type();
- Their functions will output
Result<DataType>, so we will just output them straight after adding some context.
⚠️ Implementation
pub fn parse(parser: &mut Parser) -> Result<DataType> { let unsigned = { let current = parser.current();
let unsigned = current.value == "unsigned"; if unsigned { parser.advance(); } unsigned };
let current = parser.advance().to_owned(); match current.kind { TokenKind::Identifier => { identifier_type(parser, unsigned, current).context("types::parse -> identifier") } TokenKind::Enum => enum_type(parser).context("types::parse -> Enum"), TokenKind::Struct => struct_type(parser).context("types::parse -> Struct"), other => { bail!( "types::parse: expected to fine 'Identifier' || 'Enum' || 'Struct', found: {:?} -> parsing of this token kind as datatype is not supported", other ) } }}Identifier Type
Section titled “Identifier Type”It has to output Datatype::Data. It will do this by just taking the current token as a name, and a boolean saying weather it is unsigned or not.
Then it will check if our data isn’t wrapped in pointers.Just wrap data in a DataType::Pointer for each token of kind ‘star’.
⚠️ Implementation
fn identifier_type(parser: &mut Parser, unsigned: bool, current: Token) -> Result<DataType> { let mut output = DataType::Data { name: current.value, unsigned, }; while parser.current().kind == TokenKind::Star { output = DataType::Pointer(Box::new(output)); parser.advance(); } Ok(output)}Now update call to the ‘identifier_type()’ inside the ‘parse()’. Now you should be able to parse the type from the example. In this tutorial we will be testing this later. After implementing variable declaration, but you might test it out ,right now if you want!
Parsing Arrays
Section titled “Parsing Arrays”int c[5]= ...;Data Type
Section titled “Data Type”We need to add an enum type for it to the DataType, it needs to have:
- length
- inside ‘DataType’
⚠️ Implementation
pub enum DataType { ... Array { length: u32, inside: Box<DataType> },}Parsing
Section titled “Parsing”Function to parse an array will not be called by the ‘parse’ function, this is because ‘array brackets’ come after the name of a variable. Because of this, the function will be called by the variable declaration function, so it ‘wraps’ the current datatype inside of the Array data type.
Example
Section titled “Example”int c[1][2]Will be converted into:
DataType::Array{ length:1, inside: Box{DataType::Array{length:2,inside:Box{DataType::Data{name:"int",unsigned:false}}}}Wrap_Data_Type_in_an_Array
Section titled “Wrap_Data_Type_in_an_Array”Declaration
Section titled “Declaration”pub fn wrap_data_type_in_an_array( mut data_type: DataType, parser: &mut Parser,) -> Result<DataType> {Function
Section titled “Function”Wraps its input in arrays.It will do this for each [ token. Remember to read
the length specified inside of it, and than expect the ] token.
⚠️ Implementation
pub fn wrap_data_type_in_an_array( mut data_type: DataType, parser: &mut Parser,) -> Result<DataType> { while parser.current().kind == TokenKind::OpenBracket { parser.expect(TokenKind::OpenBracket)?; let length = parsing_functions::data_parsing::str_to_num(&parser.expect(TokenKind::Number)?.value)?; parser.expect(TokenKind::CloseBracket)?; data_type = DataType::Array { length, inside: Box::new(data_type), }; }
Ok(data_type)}Parsing Struts
Section titled “Parsing Struts”typedef struct { int a;} str1;Data Type
Section titled “Data Type”The struct data type just needs to know its properties. We will implement a
Property struct inside the expressions.rs, because we will be also using it
in other places. It needs to know its name and data type.
⚠️ Implementation
use crate::parser::types::DataType,#[derive(Debug, Clone)]pub struct Property { pub var_name: String, pub var_type: DataType,}...DataType::Struct
⚠️ Implementation
#[derive(Debug, Clone)]pub enum DataType { Struct { properties: Vec<Property> }, ...}Parsing
Section titled “Parsing”We need to expect all the curly braces and the struct keyword. Afterwords we need to parse all the properties. To do this:
- Get the datatype -> use ‘parse()’
- Get variable name -> read value of the next identifier token
- Do this until the next character is not a comma.
⚠️ Implementation
fn struct_type(parser: &mut Parser) -> Result<DataType> { parser.expect(TokenKind::OpenCurly)?; let mut properties = Vec::new(); while parser.current().kind != TokenKind::CloseCurly { let data_type = parse(parser)?; let name = parser.expect(TokenKind::Identifier)?.value; parser .expect(TokenKind::SemiColon) .context("expected to find a semicolon after a expression - struct contents")?;
properties.push(Property { var_name: name, var_type: data_type, }); }
parser.expect(TokenKind::CloseCurly)?; Ok(DataType::Struct { properties })}Parsing Enums
Section titled “Parsing Enums”typedef enum { MODE_A, MODE_B = 5, MODE_C } Mode;Data Type
Section titled “Data Type”First we need to define a new type for the enum fields, because it doesn’t match the previous types that we used, it needs: value(int) and a name.
⚠️ Implementation
pub enum DataType { ... Enum { fields: Vec<EnumField> },}Type inside the DataType, will only store fields.
⚠️ Implementation
#[derive(Debug, Clone)]pub enum DataType { Enum { fields: Vec<EnumField> }, ...}Parsing
Section titled “Parsing”To parse enum we will have to read the name of a field, then check if it has some value assigned to it. If the value is set, then it should carry through to the next field increased by one.
⚠️ Implementation
fn enum_type(parser: &mut Parser) -> Result<DataType> { parser.expect(TokenKind::OpenCurly)?; let mut current_value: i32 = 0; let mut fields = Vec::new(); let mut end = false; while !end { let field_name = parser.expect(TokenKind::Identifier)?.value; match parser.advance().kind { TokenKind::Equals => { let sign: i32 = if parser.current().kind == TokenKind::Minus { parser.advance(); -1 } else { 1 }; current_value = str_to_num(&parser.expect(TokenKind::Number)?.value)? as i32 * sign; end = parser.advance().kind == TokenKind::CloseCurly; } TokenKind::Comma => {} TokenKind::CloseCurly => { end = true; } kind => { bail!( "expected to find token of kind: 'Comma' || 'Assignment' || 'CloseParen', found: '{kind:?}'" ) } }
fields.push(EnumField { name: field_name, value: current_value, });
current_value += 1; } Ok(DataType::Enum { fields })}End Result
Section titled “End Result”At in the end your parser/types/mod.rs file should look like this:
⚠️ Implementation
use crate::lexer::token::Token;use crate::parser::expression::Property;use crate::parser::{Parser, parsing_functions};use crate::{lexer::token::TokenKind, parser::parsing_functions::data_parsing::str_to_num};use anyhow::{Context, Result, bail};
#[derive(Debug, Clone)]pub struct EnumField { name: String, value: i32,}
#[derive(Debug, Clone)]pub enum DataType { Array { length: u32, inside: Box<DataType> }, Data { name: String, unsigned: bool }, Struct { properties: Vec<Property> }, Enum { fields: Vec<EnumField> }, Pointer(Box<DataType>),}
pub fn parse(parser: &mut Parser) -> Result<DataType> { let unsigned = { let current = parser.current();
let unsigned = current.kind == TokenKind::Unsigned; if unsigned { parser.advance(); } unsigned };
let current = parser.advance().to_owned(); match current.kind { TokenKind::Identifier => { identifier_type(parser, unsigned, current).context("types::parse -> identifier") } TokenKind::Enum => enum_type(parser).context("types::parse -> Enum"), TokenKind::Struct => struct_type(parser).context("types::parse -> Struct"), other => { bail!( "types::parse: expected to find 'Identifier' || 'Enum' || 'Struct', found: {:?} -> parsing of this token kind as datatype is not supported", other ) } }}pub fn wrap_data_type_in_an_array( mut data_type: DataType, parser: &mut Parser,) -> Result<DataType> { while parser.current().kind == TokenKind::OpenBracket { parser.expect(TokenKind::OpenBracket)?; let length = parsing_functions::data_parsing::str_to_num(&parser.expect(TokenKind::Number)?.value)?; parser.expect(TokenKind::CloseBracket)?; data_type = DataType::Array { length, inside: Box::new(data_type), }; }
Ok(data_type)}fn identifier_type(parser: &mut Parser, unsigned: bool, current: Token) -> Result<DataType> { let mut output = DataType::Data { name: current.value, unsigned, }; while parser.current().kind == TokenKind::Star { output = DataType::Pointer(Box::new(output)); parser.advance(); } Ok(output)}
fn enum_type(parser: &mut Parser) -> Result<DataType> { parser.expect(TokenKind::OpenCurly)?; let mut current_value: i32 = 0; let mut fields = Vec::new(); let mut end = false; while !end { let field_name = parser.expect(TokenKind::Identifier)?.value; match parser.advance().kind { TokenKind::Equals => { let sign: i32 = if parser.current().kind == TokenKind::Minus { parser.advance(); -1 } else { 1 }; current_value = str_to_num(&parser.expect(TokenKind::Number)?.value)? as i32 * sign; end = parser.advance().kind == TokenKind::CloseCurly; } TokenKind::Comma => {} TokenKind::CloseCurly => { end = true; } kind => { bail!( "expected to find token of kind: 'Comma' || 'Assignment' || 'CloseCurly', found: '{kind:?}'" ) } }
fields.push(EnumField { name: field_name, value: current_value, });
current_value += 1; } Ok(DataType::Enum { fields })}
fn struct_type(parser: &mut Parser) -> Result<DataType> { parser.expect(TokenKind::OpenCurly)?; let mut properties = Vec::new(); while parser.current().kind != TokenKind::CloseCurly { let data_type = parse(parser)?; let name = parser.expect(TokenKind::Identifier)?.value; parser .expect(TokenKind::SemiColon) .context("expected to find a semicolon after an expression - struct contents")?;
properties.push(Property { var_name: name, var_type: data_type, }); }
parser.expect(TokenKind::CloseCurly)?; Ok(DataType::Struct { properties })}If you find anything to improve in this project’s code, please create an issue describing it on the GitHub repository for this project. For website-related issues, create an issue here.
Support
Section titled “Support”All pages on this site are written by a human, and you can access everything for free without ads. If you find this work valuable, please give a star to the GitHub repository for this project.