Pages: [1]
Author Topic: a2l -> db parser  (Read 3320 times)
Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« on: October 12, 2022, 07:29:24 PM »

The following PHP script does a pretty good job at parsing an a2l file and populating an SQLite db with the essential details.

https://github.com/Turbo-Tuning/a2l-db
Logged

prj
Hero Member
*****

Karma: +1072/-481
Offline Offline

Posts: 6037


« Reply #1 on: October 13, 2022, 01:26:37 AM »

Skimming through it, a lot is not implemented. And many newer A2L's will break it.
Escape parsing is wrong as well.

Just few observations from someone who has their own parser.
Logged

PM's will not be answered, so don't even try.
Log your car properly - WinOLS database - Tools/patches
Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #2 on: October 13, 2022, 01:44:08 AM »

Skimming through it, a lot is not implemented. And many newer A2L's will break it.
Escape parsing is wrong as well.

Just few observations from someone who has their own parser.

Thanks prj, You're correct: I left a lot out. I only implemented what was essential to generate an XDF... Sorry to trouble you but you say new A2L's will break it: enlighten me. I will improve it.
Escape parsing: wouldn't htmlspecialchars() take care of it?

TIA,
Chris
Logged

prj
Hero Member
*****

Karma: +1072/-481
Offline Offline

Posts: 6037


« Reply #3 on: October 13, 2022, 02:01:17 AM »

COEFFS_LINEAR not implemented.
FORMULA not implemented.
COMPU_VTAB_RANGE not implemented.

A2L has some special escaping rules, I don't remember off the top of my head anymore, but I think your tokenizer will break in some situations.

This is just after quick look.
Try some conti a2l's.

Logged

PM's will not be answered, so don't even try.
Log your car properly - WinOLS database - Tools/patches
prj
Hero Member
*****

Karma: +1072/-481
Offline Offline

Posts: 6037


« Reply #4 on: October 13, 2022, 02:04:23 AM »

P.S. I don't think SQLite is the way to go, or SQL at all for this matter. Now some A2L's are 100+ mb and those will be very slow to navigate and work with due to overhead of SQLite.
Binary serialization is better and much faster e.g. protobuf or msgpack, but you have to implement reference pooling.

But if you just want to generate XDF then not sure why you even need to go to SQL.
Logged

PM's will not be answered, so don't even try.
Log your car properly - WinOLS database - Tools/patches
Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #5 on: October 13, 2022, 02:52:09 AM »

P.S. I don't think SQLite is the way to go, or SQL at all for this matter. Now some A2L's are 100+ mb and those will be very slow to navigate and work with due to overhead of SQLite.
Binary serialization is better and much faster e.g. protobuf or msgpack, but you have to implement reference pooling.

But if you just want to generate XDF then not sure why you even need to go to SQL.
Thanks! I didn't think about protobuf. You certainly don't need SQL for XDF but sure helps to make things easier: once you have parsed the data it sure beats having to parse it again and again while developing/debugging XDF writer code  Tongue

Quote
COEFFS_LINEAR not implemented.
FORMULA not implemented.
COMPU_VTAB_RANGE not implemented.
This is QED. I will do add it and merge the changes.

Cheers!
Chris
Logged

d3irb
Full Member
***

Karma: +134/-1
Offline Offline

Posts: 195


« Reply #6 on: October 13, 2022, 02:58:12 PM »

Have you seen https://github.com/christoph2/pyA2L (not mine, although I use it in some projects)? Very similar idea but in Python. It's slow to import (on the parsing side, not the sqlite side), but once the A2L is loaded up in sqlite, perf is very good.

The grammar is quite complete although the escaping is still kind of busted - prj is right, Continental A2Ls are the ultimate test for an A2L grammar, they abuse the escaping system in every imaginable way, with stuff like comments in string tokens, escaped quotes, escaped quotes WITH comments in string tokens, and so on.

I disagree that sqlite is a bad tool for A2L - you have relational data, you store it in a relational database. Performance is quite reasonable, a few hundred MB database is child's play for sqlite and any reasonable query will complete in a few us on modern hardware. If you're clever you can do some compression tricks and startup time becomes negligible as well. With some simple queries you can join together whatever axis refs, formulas, etc. you need to perform a given task. Of course you can do better writing some bespoke serialization format and query engine, but I doubt the time saved is worth it unless you are making a big commercial project.
Logged
Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #7 on: October 13, 2022, 10:19:24 PM »

Have you seen https://github.com/christoph2/pyA2L (not mine, although I use it in some projects)? Very similar idea but in Python. It's slow to import (on the parsing side, not the sqlite side), but once the A2L is loaded up in sqlite, perf is very good.

The grammar is quite complete although the escaping is still kind of busted - prj is right, Continental A2Ls are the ultimate test for an A2L grammar, they abuse the escaping system in every imaginable way, with stuff like comments in string tokens, escaped quotes, escaped quotes WITH comments in string tokens, and so on.

I disagree that sqlite is a bad tool for A2L - you have relational data, you store it in a relational database. Performance is quite reasonable, a few hundred MB database is child's play for sqlite and any reasonable query will complete in a few us on modern hardware. If you're clever you can do some compression tricks and startup time becomes negligible as well. With some simple queries you can join together whatever axis refs, formulas, etc. you need to perform a given task. Of course you can do better writing some bespoke serialization format and query engine, but I doubt the time saved is worth it unless you are making a big commercial project.

Yes I did. The facts it cannot handle utf8 and is extremely slow encouraged me to write my parser. I agree it is quite complete.

I have seen some really weird escaping in "longDescs" in some a2l files. Things like "blah blah ""2"" more blah \"""error blah""\" (final "blah")". I took multiple steps to overcome these. Could have been done more elegantly but it works.

Cheers!
Chris
Logged

DT
Full Member
***

Karma: +20/-1
Offline Offline

Posts: 184


« Reply #8 on: October 14, 2022, 07:43:59 AM »

I've not checked your code but you might need to check if it handles 4B0906018BG_0009.A2L correctly (search for FUNCTION_LIST)
I think my file is virgin, same modified date as date tag in file.
Logged

prj
Hero Member
*****

Karma: +1072/-481
Offline Offline

Posts: 6037


« Reply #9 on: October 14, 2022, 02:04:53 PM »

For my needs SQL is way too slow.
Right now I am able to load the full graph in 1-2 seconds and decrease the size of the A2L by 30-40x.

But I have some extreme requirements too Smiley

Modern conti A2L's I get parsed in <3s usually.
But my parser is multithreaded... So even if you're modifying the XDF outputter, I don't see any reason not to re-parse the A2L.
If it takes you longer than 5 seconds for modern A2L's you're probably doing something wrong or haven't optimized enough.

I would not really call A2L relational data.
There are very few relations actually in the file. There's pretty much only MEASUREMENT -> COMPUTAB and CHARACTERISTIC -> AXIS/COMPUTAB.
Using a SQL database does not make sense to me, it's much better represented by a dictionary (hashtable) during serialization (convert to array pool and store int references), and you can deserialize everything very fast into memory anyway, and then you have an object graph.

I don't think you really need to query it for anything beyond just the ID and name of the map either. So it's a huge lot of overhead for no gain.
« Last Edit: October 14, 2022, 02:11:28 PM by prj » Logged

PM's will not be answered, so don't even try.
Log your car properly - WinOLS database - Tools/patches
Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #10 on: October 18, 2022, 04:21:42 AM »

Following prj's suggestion I re-wrote the parser to write as XML (this made it 50x faster). And it handles all A2L data types.

Also added xmlQuery.php to show how to query the XML.

I will add data compression later, which will reduce size 10x.

https://github.com/Turbo-Tuning/a2l-db
Logged

Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #11 on: October 19, 2022, 08:22:17 PM »

I've not checked your code but you might need to check if it handles 4B0906018BG_0009.A2L correctly (search for FUNCTION_LIST)
I think my file is virgin, same modified date as date tag in file.

I somehow missed this. That file has non-printable characters in the description fields. I added a preg_replace for that. Thanks for pointing it out. Took <1 sec to generate the output.
Logged

d3irb
Full Member
***

Karma: +134/-1
Offline Offline

Posts: 195


« Reply #12 on: October 20, 2022, 10:49:18 AM »

Following prj's suggestion I re-wrote the parser to write as XML (this made it 50x faster). And it handles all A2L data types.

Also added xmlQuery.php to show how to query the XML.

I will add data compression later, which will reduce size 10x.

https://github.com/Turbo-Tuning/a2l-db

This is extremely cool. Well done!
Logged
Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #13 on: November 13, 2022, 12:26:37 AM »

I pushed an update yesterday with several improvements to the parser. The arrayManager is experimental
« Last Edit: November 13, 2022, 02:00:04 AM by Chris65 » Logged

Chris65
Newbie
*

Karma: +3/-0
Offline Offline

Posts: 10


« Reply #14 on: March 04, 2023, 03:38:11 AM »

FWIW, I made some changes and you can customise it to your needs
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Page created in 0.024 seconds with 17 queries. (Pretty URLs adds 0s, 0q)