Introducing a command line tool to decode protocol buffer messages without a protobuf schema definition.
If you usually work with protobuf services and messaging, there is probably a moment when you need to decode a protobuf binary message to do some kind of debugging.
Normally, you would have your application with the compiled proto schema, and you can use that to parse and view the data of the message in question.
But sometimes that is not the case, and you need some quick way of viewing into the contents of the message.
Aleka is a tool to assist in this case. The full source code here https://github.com/miguelabate/aleka
What Aleka does, in short, is take a binary protobuf input (coded as base64 or hex string) and decode it following the Protobuf spec generating a JSON as output.
Of course there are some caveats when you try to decode without a schema as a reference:
-
For starters, there is ambiguity when decoding since some types share the same encoding (for example fixed32 and float), in these cases, the output will show all the possible value interpretations and leave it to the user to make sense of those.
-
Continuing with the ambiguity, there is the more delicate case of string, bytes and embedded messages. These 3 different types are encoded in the same way, so it’s a little tricky to figure out the correct way to go.
The approach taken in Aleka is that it first tries to decode this type of data as a new proto message and as a string. If any or both are valid, they are added in the decoded output. If both fail, it will consider it just a sequence of bytes and add it to the output as that.
Output format
//a proto message
Message{
fields: array of Field
}
//representation of a field of the proto message, it could be a value, or repeated values, or a submessage or a list of submessages.
//Values and message can be filled at the same time in case that the decoding is ambiguous and the result can be both a proto message and a string
Field{
field_number: int expressing the field number in the proto
values: array of Value
messages: array of Message
}
//since we don't have a schema, a value can have different interpretations depending on how it's decoded. That's why there is a list here, to show the different possibilities
Value{
value_representations: array of ValueRepresentation
}
ValueRepresentation{
value: the value as a String
format_type: the type used to represent this value
}
Example
Let’ s say we have the following proto definition
message Person {
string name = 1;
int32 age = 2;
float height = 3;
repeated string nicknames = 4;
repeated Address addresses = 5;
sint32 account_balance = 7;
}
message Address {
string street = 1;
string number = 2;
string country = 3;
}
And the following message (rust code using tonic crate to handle proto messages)
Person{
name: "Michael".to_owned(),
age: 20,
height: 1.83,
nicknames: vec!["Mike".to_owned(), "The Big M".to_owned()],
addresses: vec![Address{
street: "Fake Street".to_string(),
number: "123".to_string(),
country: "NL".to_string()
},
Address{
street: "Random Street".to_string(),
number: "112334".to_string(),
country: "PT".to_string()
}],
account_balance: -100
};
The message encoded as base 64 is:
CgdNaWNoYWVsEBQdcT3qPyIETWlrZSIJVGhlIEJpZyBNKhYKC0Zha2UgU3RyZWV0EgMxMjMaAk5MKhsKDVJhbmRvbSBTdHJlZXQSBjExMjMzNBoCUFQ4xwE=
When running aleka, the output should look like
> aleka -i b64 -d CgdNaWNoYWVsEBQdcT3qPyIETWlrZSIJVGhlIEJpZyBNKhYKC0Zha2UgU3RyZWV0EgMxMjMaAk5MKhsKDVJhbmRvbSBTdHJlZXQSBjExMjMzNBoCUFQ4xwE=
{
"fields": [
{
"field_number": 1,
"values": [
{
"value_representations": [
{
"value": "Michael",
"format_type": "String"
}
]
}
]
},
{
"field_number": 2,
"values": [
{
"value_representations": [
{
"value": "20",
"format_type": "Varint(int32/int64/uint32/uint64)"
},
{
"value": "10",
"format_type": "Varint(sint32/sint64)"
}
]
}
]
},
{
"field_number": 3,
"values": [
{
"value_representations": [
{
"value": "1072315761",
"format_type": "fixed32"
},
{
"value": "1.83",
"format_type": "float"
}
]
}
]
},
{
"field_number": 4,
"values": [
{
"value_representations": [
{
"value": "Mike",
"format_type": "String"
}
]
},
{
"value_representations": [
{
"value": "The Big M",
"format_type": "String"
}
]
}
]
},
{
"field_number": 5,
"values": [
{
"value_representations": [
{
"value": "\n\u000bFake Street\u0012\u0003123\u001a\u0002NL",
"format_type": "String"
}
]
},
{
"value_representations": [
{
"value": "\n\rRandom Street\u0012\u0006112334\u001a\u0002PT",
"format_type": "String"
}
]
}
],
"messages": [
{
"fields": [
{
"field_number": 1,
"values": [
{
"value_representations": [
{
"value": "Fake Street",
"format_type": "String"
}
]
}
]
},
{
"field_number": 2,
"values": [
{
"value_representations": [
{
"value": "123",
"format_type": "String"
}
]
}
]
},
{
"field_number": 3,
"values": [
{
"value_representations": [
{
"value": "NL",
"format_type": "String"
}
]
}
]
}
]
},
{
"fields": [
{
"field_number": 1,
"values": [
{
"value_representations": [
{
"value": "Random Street",
"format_type": "String"
}
]
}
]
},
{
"field_number": 2,
"values": [
{
"value_representations": [
{
"value": "112334",
"format_type": "String"
}
]
}
]
},
{
"field_number": 3,
"values": [
{
"value_representations": [
{
"value": "PT",
"format_type": "String"
}
]
}
],
"messages": [
{
"fields": [
{
"field_number": 10,
"values": [
{
"value_representations": [
{
"value": "84",
"format_type": "Varint(int32/int64/uint32/uint64)"
},
{
"value": "42",
"format_type": "Varint(sint32/sint64)"
}
]
}
]
}
]
}
]
}
]
}
]
},
{
"field_number": 7,
"values": [
{
"value_representations": [
{
"value": "199",
"format_type": "Varint(int32/int64/uint32/uint64)"
},
{
"value": "-100",
"format_type": "Varint(sint32/sint64)"
}
]
}
]
}
]
}
Conclusion
Hope this is a useful tool to put out there. And in any case, it’ s a good excercise to understand the protobuf encoding.
The full source code can be found in https://github.com/miguelabate/aleka