Viewing/exporting from EHM 1 saved game files

Post by **archibalduk** » Mon Feb 27, 2023 9:23 pm

I thought it might be useful to put together a short guide on how to get started with exporting data from EHM saved games. I've had a few requests for guidance on how to get started, so it makes sense to put it all here for others to find. This guide and thread is on the basis of viewing the data saved in the .sav files rather than viewing data stored in the RAM (per the EHM Assistant).If you are interested in looking at what is loaded into the RAM then I suggest taking a look at the old Art Money/Custom Start Date thread for EHM 2007 (see here) which discusses some principles of how to go about doing this.

It's worth noting that the saved game structure has never been made public and so figuring out what it all means involves a lot of guesswork and experimentation. For this reason, editing .sav files beyond some very simple changes is not really possible as you would generally need to update all of the relevant primary keys in the file in order to reflect and added/deleted entries. This is not possible without knowing the full structure of each part of the .sav file. Hence this guide is focussed on exporting data from saved games (such as match stats) rather than editing. But this guide would of course be a helpful starting point for those looking to potentially edit data in saved games as the same principles would apply.

The below is written based on my findings from experimenting with EHM 2007 saved games. However, it looks like the overall structure of the EHM 1 saved game files is the same, so the below should equally apply.

Getting Started
You will need the following tools in order to get started:

A copy of EHM Editor: https://www.ehmtheblueline.com/editor (at time of writing, only EHM Editor v1 supports saved games; EHM Editor v2 does not yet);
A hex editor of your choice. There are lots of free options out there. I purchased a copy of HHD Hex Editor Neo as it was cheap and has a lot of useful features. There is also a free version of HHD; and
Something along the lines of the Calculator that comes with Windows or access to an online hex to decimal convertor (https://www.thecalculatorsite.com/math/ ... ch as here).

Knowledge of Databases
A very basic understanding of how databases work is ideal but not essential. The minimum you need to know is that databases use what is known as a primary key:

https://www.techtarget.com/searchdatamanagement/definition/primary-key wrote:A primary key, also called a primary keyword, is a column in a relational database table that's distinctive for each record. It's a unique identifier, such as a driver's license number, telephone number with area code or vehicle identification number (VIN). A relational database must have only one primary key. Every row of data must have a primary key value and none of the rows can be null.

As you will see from the EHM 20007 database structure, primary keys are used throughout the database and I expect the same is the case for all parts of the saved game given that this is an elementary part of database design (from what I can tell, the .sav saved game is just one big database of sorts). An example of a primary key is Club ID. Every club in the database/saved game must have a Club ID. This ID is then referenced in other parts of the database; e.g. to assign a particular club as a player's Club Contracted or Club Playing For field, you would enter the relevant club's Club ID. So when you are looking through the various parts of the saved game, it is generally things such as Staff ID, Club ID and Club Competition ID which you're looking out for.

Knowledge of Data/Coding
A very basic understanding of how binary data is stored in files is also ideal but not essential. The minimum you need to know is that each byte of binary data is represented in hexadecimal format (also known simply as hex). There are different data types which are of varying length in bytes. The main ones are as follows:

True/False:
Bool: 1 byte (0 = false / 1 = true / any non-zero number = true)

Whole numbers:
Char: 1 byte
Short: 2 bytes
Integer (aka Int): 4 bytes
Long Integer (aka Long Int)): 8 bytes

Decimal numbers:
Float: 4 bytes
Double: 8 bytes

Given that each byte can represent a finite number of possible values, the larger data types can represent a wider range of numbers. It is possible for chars, shorts, ints and long ints be unsigned (meaning that the lowest possible value is zero) or signed (meaning that the lowest possible value is a negative number). Details of ranges for each data type (both signed and unsigned) can be found here: https://www.tutorialspoint.com/cplusplu ... _types.htm

As you will see from the EHM 20007 database structure, primary keys (e.g. Club ID, Nation ID, Staff ID, etc) are signed ints. The first ID in each table is always zero. Usually a -1 (or sometimes a -2) means no ID - for example, a free agent would have their Club Contracted and Club Playing For set to -1 to denote that they are not contracted to or playing for any club.

It is important to note that the .sav file seems to use little endian (as opposed to big endian) format. So take for example the number 45,102 which is B02E in hex code. As an integer (being 4 bytes) this can be represented as 2E B0 00 00 (little endian) or 00 00 B0 2E (big endian). As we are using little endian format, we'd go with 2E B0 00 00. So to decode this, you'd need to read it backwards. If you open the Calculator in Windows and click on the menu and select Programmer, you will see that there is the option to enter hex format. Click on hex and then enter the little endian pairs in reverse and you will see that there is a line which shows the decimal equivalent. Let's take the example of 2E B0 00 00: Enter each pair from right to left into the calculator, so enter "00", "00", "B0", "2E" (you will find that the calculator ignores the initial zeros which is fine). You'll see that the DEC line in the calculator shows 45,102 as the result:

This is a really important concept to understand in order to be able to "read" the hex code in the .sav files and identify patterns. If this isn't clear then trying Googling "endianess" or "little vs big endian" for more examples.

In addition to the above, you will find that text strings are either represented as:

an array of ASCII chars (i.e. one byte per character); or
a UTF-style string (i.e. two bytes per character)

Take for example the word "Dave". In hex D = 44, a = 61, v = 76 and e = 65. So as a char array, "Dave" would look like this:

Code: Select all

44 61 76 65

Char arrays are generally of a fixed size for each field in a table, so if the field was fixed as 10 bytes long it would look like this:

Code: Select all

44 61 76 65 00 00 00 00 00 00

A string is slightly different as there is first an int denoting how long the string is (so each string field can vary in size) followed by two bytes per character. So Dave would look like this:

Code: Select all

04 00 00 00 44 00 61 00 76 00 65 00 0b 00 00 00

So broken down this is:

04 00 00 00 = denotes that the string is 4 characters long
44 00 = D
61 00 = a
76 00 = v
65 00 = e
0b 00 = denotes that this is the end of the string (also known as a null character)

Helpfully most hex editors will show decoded char arrays and strings alongside the raw hex, which makes things much easier. Here's an example from the first_names.dat sub-file of a saved game:

The Saved Game Structure
The easiest way to see what is within a saved game is to open it with the EHM Editor. Note that the Editor and this guide only applies to uncompressed saved games (i.e. with the Save Compressed setting disabled in EHM). I have never looked at how compressed saved games are compressed, so you will need to disable the Save Compressed setting in EHM when saving the game you want to look at.

Having opened the saved game in the Editor, click on Data -> Saved Game Index. This lists out the constituent parts of the saved game. You will see that the saved game consists of a number of files stored within the .sav file. I will refer to these as sub-files within this guide for clarity. If you click on File -> Unpack, you can extract the sub-files into a folder of your choosing. This is the easiest way of accessing the sub-files.

The alternative way of accessing the sub-files is to write your own code in something like C++, C# or Python. The first 12 bytes of the .sav file consists of a header as follows:

Code: Select all

int compressed_flag (0 = uncompressed / 1 = compressed)
int header_flag (not sure what this means);
int sub_file_count (denotes the number of sub_files contained with the .sav file)

The next part of the .sav file consists of a list (or index) of the sub-files as set out below. The number of entries in this index is equal to the sub_file_count value referenced above.

Code: Select all

unsigned int file_pos (indicates the sub_file's position/address (in bytes) within the .sav file - e.g. 0 = first byte of the file, 1 = second byte of the file)
unsigned int file_size (indicates the size of the sub_file (in bytes))
char sub_file_name[260] (an ASCII char array (260 bytes in length) denoting the file name of the sub_file);

From the file_pos and the file_size we can pinpoint exactly where the sub-file is located within the .sav file. So if a sub-file has a position of 9,334,434 and a size of 10 then we know that sub-file is located at bytes 9,334,434 to 9,334,443 of the .sav file (note that the range is inclusive of the first byte in the range, so the final byte in the range is 9,334,443 and not 9,334,444). So we could extract that sub-file by extracting that data range. This is what the Unpack function of the Editor does.

So in pseudo-code, the function to parse the .sav file would be as follows:

Code: Select all

std::fstream file ("game.sav", std::ios::in | std::ios::binary);

// STEP 1: Read the header
file.read(&compressed_flag, sizeof(compressed_flag));
file.read(&header_flag, sizeof(header_flag));
file.read(&sub_file_count, sizeof(sub_file_count));

// Abort if compressed
if(compressed_flag != 0)
	return;
	
// STEP 2: Read each index entry
// Some vector containing each index entry (this would be a class/struct containing unsigned int file_pos, unsigned int file_size and char[260] sub_file_name))
std::vector<IndexEntry> index;

for(int i = 0; i < sub_file_count; ++i) {
	IndexEntry index_entry;
	file.read(&index_entry.file_pos, sizeof(index_entry.file_pos));
	file.read(&index_entry.file_size, sizeof(index_entry.file_size));
	file.read(&index_entry.sub_file_name, 260);
}

// STEP 3: Read each sub-file by iterating over each index entry
// Personally I'd use something like for(const auto &index_item : index) but I've used a simpler form for clarity.
for(int i = 0; i < index.size(); ++i) {
	const auto &index_item = index[i]; // Reference to the relevant entry of the index

	// The following code would read the binary data of each sub-file into a temporary array/buffer
	char buffer[index_item.file_size];
	file.seekg(index_entry.file_pos, ios::beg); // Navigate to the position of the sub-file
	file.read(&buffer, index_item.file_size); // Read the sub-file into the buffer
}

The Sub-Files
Once you have extracted the sub-files from the .sav file then this is where the guesswork and experimentation takes place. The sub-files consist of a database.zdb file and a number of .dat and .tmp files. The database.zdb file is basically a copy of the starting database.db file which then appears to be modified by EHM as the game progresses. The EHM Editor parses the database.zdb file when loading a .sav game into the Editor and so the various editing screens within the Editor are populated using the data from the database.zdb file. So you can use the Editor to check what primary keys are assigned to various items (e.g. to check what Club ID is assigned to Anaheim Ducks, etc). It seems that the game doesn't store things such as player career history in the database.zdb file (I suppose it must be stored in another sub-file) and so you will that no player career history entries will appear in the Editor.

My guess is that the .dat sub-files contain permanent data (such as player stats, club histories, club competition histories, etc) whereas the .tmp files presumably store more temporary data relating to playable leagues, etc.

So how do you figure out what is in a sub-file? By opening it up in the hex editor of your choice and trying to identify patterns. I have found that the first 4 bytes of some sub-files is an int specifying the number of entries stored within that sub-file (but some sub-files do not appear to have this). That int is then followed by each entry (aka record). Typically each entry starts with an entry/record ID (i.e. primary key) which is usually an int (i.e. 4 bytes) but can sometimes be a char or possibly IIRC a short. Sometimes however the primary key is located later on within each entry, so it isn't absolutely always the first few bytes of an entry.

My starting point is to look at the first four bytes of the sub-file and convert this to decimal format. If it is a zero then it is probably a record ID (because the first record ID is always zero - i.e. 00 00 00 00 in hex as an int). If it is something else then it might be a record count (i.e. indicating the number of entries in the sub-file). There is always a possibility that the record count and/or record IDs might be a char (e.g. zero = 00 in hex) or short (e.g. zero = 00 00), so it's always worth trying that if no obvious pattern appears from looking at them as ints.

Assuming that the initial 1/2/4 bytes denote a record count as a char/short/int then you can try the following calculation: Take the file size in bytes of the sub-file and subtract the size of the record count (e.g. if the record count is 4 bytes then subtract 4 from your file size). Then divide that figure by the record count. So if you had an int record count of 4 and your file is 48 bytes in size: 48 - 4 = 44. 44 / 4 = 11 bytes. This may indicate the number of bytes per record, assuming that each record is of a fixed size. You can then open up the sub-file in your hex editor, delete the initial record count (e.g. the first 4 bytes) and set your hex editor to arrange the remaining data so that it displays one record per row. This option isn't always possible for every hex editor, but certainly HHD does this (which is one reason I use it). So again taking my example, if I delete the initial four bytes, I'm left with 44 bytes of data and I would then set HHD to show 11 bytes of data per row. This will nicely show one record per row which makes it much easier to interpret patterns.

The above of course assumes that each record has the same number of bytes per entry. There are some sub-files which appear to have variable sizes per entry or possibly just one very complicated entry. They are going to be particularly challenging, if not impossible, to decipher.

As for the decoding each record of a sub-file, it really is just a case of guesswork and trying to identify patterns. I have found that a good starting point is to try to identify potential references to primary keys (which will be ints - i.e. 4 bytes) and cross-reference these to the ID values shown in the Editor when viewing the saved game. For example, if you're looking at club histories there will probably be references to Club IDs and Club Competition IDs. Similarly, player stats will likely include a Staff ID. However, it might be possible that the record ID doubles-up as reference to the Club ID/Staff ID, etc.

Worked Example: HostCountry.tmp
Opening up HostCountry.tmp from my example 1974/75 saved game looks like this (as an aside, note the right hand margin where there is a very clear repating pattern which suggests that there is a fixed size of record in this sub-file):

You will see that the first four bytes are 67 00 00 00. Lets assume that is an int denoting the record count, so that's 103 in decimal format. According to Windows the file size is 6,596 bytes (right-click on the file, click on Properties and look for the Size property. Ignore the Size on Disk property). 6,596 - 4 = 6,592 bytes. Dividing this figure by 103 = 64. So it looks like 64 bytes. Deleting the first four bytes from the file and setting HHD to show 64 bytes per row shows this(to do this in HHD click on View -> Columns -> Custom and enter 64):

Now there definitely appears to be a pattern emerging, so we can be pretty confident we've figured out the size of each record. The next task is to figure out the fields in each record. It doesn't look like there is a record ID at the start of each record as otherwise the first row would be 00 00 00 00, the second row 01 00 00 00, the third row 02 00 00 00, etc. The HostCountry.tmp sub-file will inevitably include a Nation ID to indicate the host country and a Club Competition ID to indicate the club competition. There's probably also a short to indicate the season/year.

Let's take the following record as an example:

Seeing as this is a 1974/75 database, it's worth checking what 1974 is in hex (you can use the Windows Calculator to do this). As a short 1974 is B6 07. So if we see anything which looks like B6 07 or thereabouts then it's probably a year. As it happens, the above example includes a D6 07 (at byte positions 06 and 07) which is 2006 in decimal. I suspect that the immediately preceding F9 might be a char representing the day of the year (e.g. 0 = 1 Jan, 1 = 2 Jan, etc). So F9 = 249. Seeing as zero represents the first day of the year, 249 represents the 250th day of the year = 7 September.

It looks like there are a few sections which might be ints starting at positions 00, 08 and 0C (amongst others). So that's the following ints in hex which convert to decimal as follows:

42 02 00 00 = 578
35 00 00 00 = 53
61 00 00 00 = 97

By exporting model spreadsheets from the Editor for Club Competitions and Nations in the Editor we can look up Club Competition IDs and Nation IDs (go to the Club Competitions screen in the Editor and click on Export -> Model and then do the same from the Nations screen). Interestingly, Club Competition ID 578 = World Junior Championships U-20 Div 1, Nation ID 53 = Denmark and Nation ID 97 = Italy. So it's looking like we've identified the fields for the club competition, date/year and two or more host countries. Obviously if you sim to 2006 (or find an earlier example) in-game then you can verify the findings.

Further Reading
This thread has some discussion and findings from back in 2013 when we were looking at EHM 2007 saved games: viewtopic.php?t=10423&start=25

Post by **archibalduk** » Mon Feb 27, 2023 9:23 pm

Here is what I was able to decode from EHM 2007 saved games several years ago. This might also work for EHM 1 saved games but I haven't checked.

NOTE 1: "Unk" or "Unk_xx" indicates an unknown field. E.g. "short Unk_05" indicates an unknown short field. It should also be noted that consecutive Unk fields might actually be a single unknown field.
NOTE 2: There are a few references to "SI_DATE" which is a 5 byte struct detailed in the EHM 2007 DB Structure as follows:

Code: Select all

struct SI_DATE
{
short day;
short year;
bool leap_year;
};

contract.dat
Credit to Lazion who figured out much of this: viewtopic.php?p=150158#p150158

The file begins with 12 bytes as follows:

8 bytes of unknown data
An int to indicate the initial record count

Each initial record then appears as listed below. However, in EHM 1 each record is 252 bytes long which means that the below is out by one byte (the below adds up to 253 bytes).

There appears to be further data at end of the file (i.e. after the initial records) and we haven't yet figured that out.

Code: Select all

int StaffID;
intClubContracted;
char Unk_1[8];
int CapHit_wk;
int Salary01_wk;
int Salary02_wk;
int Salary03_wk;
int Salary04_wk;
int Salary05_wk;
int Salary06_wk;
int Salary07_wk;
int Salary08_wk;
int Salary09_wk;
int Salary10_wk;
char BonusAchieved;
char BonusGP_amount;
int BonusGP_value;
char BonusGoals_amount;
int BonusGoals_value;
char BonusAssists_amount;
int BonusAssists_value;
char BonusPoints_amount;
int BonusPoints_value;
char BonusPPG_amount;
int BonusPPG_value;
char BonusSHG_amount;
int BonusSHG_value;
char BonusWins_amount;
int BonusWins_value;
char BonusShutouts_amount;
int BonusShutouts_value;
char BonusGAA_amount;
int BonusGAA_value;
char BonusSaves_amount;
int BonusSaves_value;
char BonusPlusMinus_amount;
int BonusPlusMinus_value;
char Unk_2[49];
bool Clause_NHLRelease;
char Clause_PlayerOptionYear;// Might be bools
char Clause_NoTrade;
char Clause_TwoWay;
char Clause_Relegation;
char Clause_ProfessionalOffer;
char Salary_TwoWay;// Percentage
SI_DATE ContractStart;// Might be Date Joined rather than Started// 172
SI_DATE ContractExpiry;
char ContractType;
char Unk_3[13];
char Unk_4[13]; // Probably the future settings

int CapHit_yr;
int Salary01_yr;
int Salary02_yr;
int Salary03_yr;
int Salary04_yr;
int Salary05_yr;
int Salary06_yr;
int Salary07_yr;
int Salary08_yr;
int Salary09_yr;
int Salary10_yr;

fixtures_xxxx.tmp
The file begins with an int which indicates the record count.

In EHM 2007 each record then appears as listed below (76 bytes per record). In EHM 1 it appears this is now 84 bytes per record.

Code: Select all

short Comp;
short Arena;
short ID;
short Played;// Might be two chars (the second char being the Played field)
short ClubHome;
short ClubRoad;
short Unk_05;
short Unk_06;
short Game1_year;
short Game1_day;
short Game2_year;
short Game2_day;
short Game3_year;
short Game3_day;
short Unk_07;
short Unk_08;
short Unk_09;
short Unk_ID; // This might be a game ID
short Unk_10;
short Unk_11;
short Unk_12;
short Unk_13;
short Unk_14;
char Unk_15;
char Score_RegularTime_Home;
char Score_RegularTime_Road;
char Score_OT_Home;
char Score_OT_Road;
char Goals_PenaltyShots_Home;
char Goals_PenaltyShots_Road;
char Score_Aggregate_Home;
char Score_Aggregate_Road;
char Goals_Period1_Home;
char Goals_Period1_Road;
char Goals_Period2_Home;
char Goals_Period2_Road;
char Goals_Period3_Home;
char Goals_Period3_Road;
char Goals_OT_Home;
char Goals_OT_Road;
char PP_Home;
char PPG_Home;
char PP_Road;
char PPG_Road;
char SHG_Home;
char SHG_Road;
char Unk_30;
char Unk_31;
char Unk_32;
char Unk_33;
char Unk_34;
char Unk_35;
char Unk_36;

HostCountry.tmp
The file begins with an int which indicates the record count. Each record then appears as follows (30 bytes per record):

Code: Select all

int Comp;
short Year;
int Host_1;
int Host_2;
char Unk1[20];

iihf_rankings.dat

Code: Select all

int NationID;
unsigned short Unk_01;
unsigned short Unk_02;
unsigned short Unk_03;
unsigned short Unk_04;
unsigned short Unk_05;
unsigned short Unk_06;
short RankYear1;
short RankYear2;
int ScoreYear1;
int ScoreYear2;
int ScoreYear3;
int ScoreYear4;
int ScoreYear5;
int ScoreYear6;
int ScoreTotal;

Post by **archibalduk** » Mon Feb 27, 2023 9:24 pm

I should add that I haven't really proof read the above as it's taken so long to write. I'll proof read it probably next weekend and I'll add some details what was previously decoded from EHM 2007 sub-files (which might be the same as EHM 1 format sub-files).

Post by **archibalduk** » Thu Mar 02, 2023 5:54 pm

I've updated the second post of this thread with what I was able to decode for EHM 2007.