|
-
Apprentice
Performant damage over time
Hi,
I have been working on setting up a system to allow various damage over time effects, e.g. burn for fire spells. So far I have been only experimenting with LeGo buffs and using them I have been able to set up a somewhat working system where the damage ticks every second for example.
However, if I try to increase the tick rate to more than few times per second the performance becomes a little choppy when there are a lot of buffs going around, e.g. a lot of burn debuffs from a fire rain spell. I can't help but wonder if there is some more performant approach I could explore.
Even if there is no other easy approaches, perhaps someone can simply confirm that these LeGo buffs can be performant even if there are many of them ticking simultaneously! Any ideas or hints would be much appreciated.
-
Apprentice
So, I think I found a potential solution.
My idea is to modify the unused existing HP/Mana regeneration mechanic (with ATR_REGENERATEHP and ATR_REGENERATEMANA) so that these attributes instead control what direction HP/Mana should be changed at a set interval.
I was able to find the bit inside oCNpc::Regenerate that handles this mechanic using IDA. For example; this is the part that handles the regeneration for HP.
Code:
00742030 - push 1 ; int
00742032 - push 0 ; int
00742034 - mov ecx, esi ; this
00742036 - call ?ChangeAttribute@oCNpc@@QAEXHH@Z ; oCNpc::ChangeAttribute(int,int)
0074203B - fild dword ptr [esi+1D0h]
00742041 - fmul ds:__real@447a0000
00742047 - fstp dword ptr [esi+7C4h]
The part up until the ChangeAttribute call handles the regeneration: 1 is the amount, 0 is the atribute index for hitpoints. The part after handles resetting the timer for the regen. It loads the integer (as float) from ATR_REGENERATEHP and multiplies it by 1000 to get milliseconds, then stores it for next cycle.
It is clear to me what needs to be done:
* Set the amount to be read from the attribute instead of using constant 1.
* Set the timer to be stored from constant value instead of reading it from the attribute.
I think I might be able to figure out how to load and push the attribute value for the ChangeAttribute call, but the bit with the floating points is a bit difficult... I don't really know how to "load" a constant and then store it.
Any help would be much appreciated!
PS, incase you are wondering:
Outside my plan would be to use the LeGo buffs to increment/decrease these attributes on hero/NPCs according to what is happening in the game. This should probably solve the main issue I'm having with LeGo buffs based damage-over-time effects: Choppy performance with many overlapped buffs ticking at the same time.
EDIT
I realize I could also just override that "Push 1" into a "Push 0" and use a hook to actually handle the damage/heal effects. This could give some more flexibility, though I don't know if that might have the same performance issue I am having with the buffs. In any case, I would still need to somehow override the timer with a constant value.
Geändert von LootaBox (19.06.2021 um 17:46 Uhr)
-
Apprentice
Alright, made some progress, but still struggling.
First I noticed a bug; mana regen is actually looking at ATR_REGENERATEHP to check whether it triggers at all. The following snippet will fix it and could be used independently (in case some madman wanted to use the regen mechanic as it is)
Code:
// Fix: mana regen triggers if ATR_REGENERATEMANA != 0 (instead of ATR_REGENERATEHP)
MemoryProtectionOverride(7610451, 1);
MEM_WriteByte(7610451 /* 0x742053 */, 212 /* 0xd4 */); // [esi+1D0h] -> [esi+1D4h]
Honestly, no clue how the MemoryProtectionOverride is supposed to be used, but that works.
Despite my initial trepidation, I managed to set the timer to a static value, rather then it being seconds determined by the attribute. With the following snippet it is pretty easy to control the frequency this regen/degen is applied.
Code:
const float intervalMs = 200.0;
const int oCNpc__Regenerate__Life_SetTimer_Start = 7610427; // 0x74203B
const int oCNpc__Regenerate__Life_SetTimer_End = 7610439; // 0x742047
const int oCNpc__Regenerate__Mana_SetTimer_Start = 7610503; // 0x742087
const int oCNpc__Regenerate__Mana_SetTimer_End = 7610515; // 0x742093
// Fill with NOPs (ultimately only replaces fild(6) -> NOP(x6))
repeat(i1, oCNpc__Regenerate__Life_SetTimer_End - oCNpc__Regenerate__Life_SetTimer_Start); var int i1;
MEM_WriteByte(oCNpc__Regenerate__Life_SetTimer_Start + i1, 144); // NOP | 0x90
end;
repeat(i2, oCNpc__Regenerate__Mana_SetTimer_End - oCNpc__Regenerate__Mana_SetTimer_Start); var int i2;
MEM_WriteByte(oCNpc__Regenerate__Mana_SetTimer_Start + i2, 144); // NOP | 0x90
end;
// Replace fmul(6) -> fld(6) using *intervalMs
var int intervalPtr; intervalPtr = MEM_GetFloatAddress(intervalMs);
MEM_WriteByte(oCNpc__Regenerate__Life_SetTimer_End - 6, 217); // fld | 0xd9
MEM_WriteByte(oCNpc__Regenerate__Life_SetTimer_End - 5, 5); // fld | 0x05
MEM_WriteInt (oCNpc__Regenerate__Life_SetTimer_End - 4, intervalPtr);
MEM_WriteByte(oCNpc__Regenerate__Mana_SetTimer_End - 6, 217); // fld | 0xd9
MEM_WriteByte(oCNpc__Regenerate__Mana_SetTimer_End - 5, 5); // fld | 0x05
MEM_WriteInt (oCNpc__Regenerate__Mana_SetTimer_End - 4, intervalPtr);
The only bit I can't figure out is actually replacing that push 1 with what basically should be push dword ptr [esi+1D0h]. The trouble is that the original instruction is 2 bytes and what I need is 3 bytes. As I've made some NOPs in the addresses following the call instruction I assumed I could just "move" it and all preceeding instructions forward one byte in memory to make space for the larger push instruction, and I've tried several ways to do this, but I can't get anything working. I feel like I'm missing something obvious, but I can't see it.
Any help or hints appreciated, i.e. how to move instructions around in memory, perhaps there some magic between the bytes that I should be aware of. I am also wondering how this memory protection override should actually be used.
Geändert von LootaBox (27.06.2021 um 13:34 Uhr)
-
I can't really give you a complete solution, but at least I can fill in some bits and pieces to help you understand.
Let's start with MemoryProtectionOverride. At the lowest level, all regions of memory are the same, i.e. the region where the code is placed is on the same physical RAM as the data (e.g. the name of an NPC, or the value of ATR_REGENERATEHP for a given NPC). This is called Von-Neumann-Architecture. There are devices where this is not true (called Harvard-Architecture), but this is generally only found in e.g. embedded devices and hence not relevant for us (consumer hardware is always Von-Neumann-Architecture). However, the operating system still distinguishes these regions. To protect against malicious code, the region where the code from an executable is located, is marked as read-only. In other words: You're not allowed to modify any of the bytes in the memory region where the code from an executable is located. This is helpful even if there's no malicious code involved, because accidentally overwriting those bytes is generally a bug which could case unforeseen consequences and might be hard to catch. However, you can change the permissions for any given address with a call to the Windows kernel, which is exactly what MemoryProtectionOverride does. Hence the second parameter to MemoryProtectionOverride marks the length of the memory which should be made writeable. If you only need to overwrite one byte, you only need a length of 1 (MEM_WriteByte used to always write 4 bytes, but I think in the newest version of Ikarus it writes exactly 1 byte due to another bug we encountered).
Ok this was quite long-winded, the next part (why can't you simply move the call) is a bit simpler. Both jumps and calls are generally encoded using relative offsets, i.e. they do not contain the absolute address which should be executed. Instead they contain an offset which is added to the current address. Hence moving the instruction, even slightly, will cause a crash most of the time. It's generally easier to avoid moving such an instruction, but if you absolutely have to do it, you will need to adjust the offset likewise.
-
Zitat von LootaBox
Hi,
I have been working on setting up a system to allow various damage over time effects, e.g. burn for fire spells. So far I have been only experimenting with LeGo buffs and using them I have been able to set up a somewhat working system where the damage ticks every second for example.
However, if I try to increase the tick rate to more than few times per second the performance becomes a little choppy when there are a lot of buffs going around, e.g. a lot of burn debuffs from a fire rain spell. I can't help but wonder if there is some more performant approach I could explore.
Even if there is no other easy approaches, perhaps someone can simply confirm that these LeGo buffs can be performant even if there are many of them ticking simultaneously! Any ideas or hints would be much appreciated.
Can you try to modify the LeGo "Talents.d" code that does the NPC lookup?
I found that if you "optimize" the lookup for buffed Npc, you get much better performance (for regular case, in worst case, you lose a tiny bit of performance because of new checks introduced).
Look for "Npc_FindByID" and adjust the code as in this Spoiler. This might also work great for you.
Geändert von Kirides (25.06.2021 um 14:46 Uhr)
-
Apprentice
Thank you Lehona, this really helped me understand the whole thing better. From my previous snippets the MemoryProtectionOverride length can be reduced to just 1 and I managed to change the "push 1" now to push the appropriate attribute values:
Code:
const int oCNpc__Regenerate__Life_CallChangeAttribute_Start = 7610416; // 0x742030
const int oCNpc__Regenerate__Life_CallChangeAttribute_End = 7610427; // 0x74203B
const int oCNpc__Regenerate__Mana_CallChangeAttribute_Start = 7610492; // 0x74207C
const int oCNpc__Regenerate__Mana_CallChangeAttribute_End = 7610503; // 0x742087
// All call related instructions will be offset, so the call offset itself must first be adjusted
const int offset = 4; var int orig_pos;
orig_pos = MEM_ReadInt(oCNpc__Regenerate__Life_CallChangeAttribute_End - 4);
MEM_WriteInt(oCNpc__Regenerate__Life_CallChangeAttribute_End - 4, orig_pos - offset);
orig_pos = MEM_ReadInt(oCNpc__Regenerate__Mana_CallChangeAttribute_End - 4);
MEM_WriteInt(oCNpc__Regenerate__Mana_CallChangeAttribute_End - 4, orig_pos - offset);
// Offset all CallChangeAttribute related instruction to make space
repeat(k1, oCNpc__Regenerate__Life_CallChangeAttribute_End - oCNpc__Regenerate__Life_CallChangeAttribute_Start); var int k1;
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_End - k1 + (offset - 1), MEM_ReadByte(oCNpc__Regenerate__Life_CallChangeAttribute_End - k1 - 1));
end;
repeat(k2, oCNpc__Regenerate__Mana_CallChangeAttribute_End - oCNpc__Regenerate__Mana_CallChangeAttribute_Start); var int k2;
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_End - k2 + (offset - 1), MEM_ReadByte(oCNpc__Regenerate__Mana_CallChangeAttribute_End - k2 - 1));
end;
// push 1 -> push dword ptr [esi+1D0h]
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 0, 255); // 0xff
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 1, 182); // 0xb6
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 2, 208); // 0xd0
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 3, 1); // 0x01
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 4, 0); // 0x00
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 5, 0); // 0x00
// push 1 -> push dword ptr [esi+1D4h]
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 0, 255); // 0xff
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 1, 182); // 0xb6
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 2, 212); // 0xd4
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 3, 1); // 0x01
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 4, 0); // 0x00
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 5, 0); // 0x00
So, this combined with the previous snippets allows easy control over regen, degen or combination of the two for any NPC. I have not actually done any performance testing yet to see if it is actually any faster, but if nothing else this was a great learning experience for me.
There is one issue, another bug it seems, that I have not yet quite resolved... Sometimes the regen just stops, e.g. if you take fall damage, due to this condition:
Code:
fcomp ds:__real@00000000
fnstsw ax
test ah, 41h
jp short loc_742051
After breaking it down, my understanding is that it jumps (skipping regen) if st(0) < 0.0. As far as I can see, what should be on top of the stack is the timer for the regen, as expected. My suspicion is that for some reason this timer (at esi+7C4h inside the function) is overridden somewhere else, but I'm not sure... I will have to look into this a bit more before I even know what to ask about.
I did not end up trying out your idea Kirides, does not seem like I will need it for this one, but I will keep it in mind. It may come in handy if I do need better performance for the buffs for some other reason.
Thanks for all the help.
-
Apprentice
Alright, finally found the time to dive into this again.
The issue was not just with fall damage, my understanding is that PB had intended to reset the timer for this regen mechanic whenever any damage was received. Of course, it was reset according to the original mechanic, where the attribute determined the frequency of the regen in seconds. I don't see the need for this resetting with my changes, so I opted to remove the resetting completely. This was easy enough to do by filling in some NOPs at the very end of oCNpc::OnDamage_Hit.
Here is the full solution:
Code:
// ------------------------------------------------------------------------------
// Fix: mana regen triggers if ATR_REGENERATEMANA != 0 (instead of ATR_REGENERATEHP)
MemoryProtectionOverride(7610451, 1);
MEM_WriteByte(7610451 /* 0x742053 */, 212 /* 0xd4 */); // [esi+1D0h] -> [esi+1D4h]
// ------------------------------------------------------------------------------
// Make timer for regen static and independent of the attribute
const float intervalMs = 1000.0;
const int oCNpc__Regenerate__Life_SetTimer_Start = 7610427; // 0x74203B
const int oCNpc__Regenerate__Life_SetTimer_End = 7610439; // 0x742047
const int oCNpc__Regenerate__Mana_SetTimer_Start = 7610503; // 0x742087
const int oCNpc__Regenerate__Mana_SetTimer_End = 7610515; // 0x742093
// Fill with NOPs (ultimately only replaces fild(6) -> NOP(x6))
repeat(i1, oCNpc__Regenerate__Life_SetTimer_End - oCNpc__Regenerate__Life_SetTimer_Start); var int i1;
MEM_WriteByte(oCNpc__Regenerate__Life_SetTimer_Start + i1, 144); // NOP | 0x90
end;
repeat(i2, oCNpc__Regenerate__Mana_SetTimer_End - oCNpc__Regenerate__Mana_SetTimer_Start); var int i2;
MEM_WriteByte(oCNpc__Regenerate__Mana_SetTimer_Start + i2, 144); // NOP | 0x90
end;
// Replace fmul(6) -> fld(6) using *intervalMs
var int intervalPtr; intervalPtr = MEM_GetFloatAddress(intervalMs);
MEM_WriteByte(oCNpc__Regenerate__Life_SetTimer_End - 6, 217); // fld | 0xd9
MEM_WriteByte(oCNpc__Regenerate__Life_SetTimer_End - 5, 5); // fld | 0x05
MEM_WriteInt (oCNpc__Regenerate__Life_SetTimer_End - 4, intervalPtr);
MEM_WriteByte(oCNpc__Regenerate__Mana_SetTimer_End - 6, 217); // fld | 0xd9
MEM_WriteByte(oCNpc__Regenerate__Mana_SetTimer_End - 5, 5); // fld | 0x05
MEM_WriteInt (oCNpc__Regenerate__Mana_SetTimer_End - 4, intervalPtr);
// ------------------------------------------------------------------------------
// Remove resetting of regen timers on damage calculation
const int oCNpc__OnDamage_Hit__ResetTimers_Start = 6737613; // 0x66CECD
const int oCNpc__OnDamage_Hit__ResetTimers_End = 6737686; // 0x66CF16
// Fill with NOPs
var int address;
repeat(r, oCNpc__OnDamage_Hit__ResetTimers_End - oCNpc__OnDamage_Hit__ResetTimers_Start); var int r;
address = oCNpc__OnDamage_Hit__ResetTimers_Start + r;
// there is unrelated mov and pop's in the middle, leave them untouched
if (address < 6737657 /* 0x66CEF9 */ || address >= 6737664 /* 0x66CF00*/)
&& (address < 6737678 /* 0x66CF0E */ || address >= 6737680 /* 0x66CF10*/) {
MEM_WriteByte(address, 144); // NOP | 0x90
};
end;
// ------------------------------------------------------------------------------
// Make regen use attribute for regen amount per cycle
const int oCNpc__Regenerate__Life_CallChangeAttribute_Start = 7610416; // 0x742030
const int oCNpc__Regenerate__Life_CallChangeAttribute_End = 7610427; // 0x74203B
const int oCNpc__Regenerate__Mana_CallChangeAttribute_Start = 7610492; // 0x74207C
const int oCNpc__Regenerate__Mana_CallChangeAttribute_End = 7610503; // 0x742087
// All call related instructions will be offset, so the call offset itself must first be adjusted
const int offset = 4; var int orig_pos;
orig_pos = MEM_ReadInt(oCNpc__Regenerate__Life_CallChangeAttribute_End - 4);
MEM_WriteInt(oCNpc__Regenerate__Life_CallChangeAttribute_End - 4, orig_pos - offset);
orig_pos = MEM_ReadInt(oCNpc__Regenerate__Mana_CallChangeAttribute_End - 4);
MEM_WriteInt(oCNpc__Regenerate__Mana_CallChangeAttribute_End - 4, orig_pos - offset);
// Offset all CallChangeAttribute related instruction to make space
repeat(k1, oCNpc__Regenerate__Life_CallChangeAttribute_End - oCNpc__Regenerate__Life_CallChangeAttribute_Start); var int k1;
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_End - k1 + (offset - 1), MEM_ReadByte(oCNpc__Regenerate__Life_CallChangeAttribute_End - k1 - 1));
end;
repeat(k2, oCNpc__Regenerate__Mana_CallChangeAttribute_End - oCNpc__Regenerate__Mana_CallChangeAttribute_Start); var int k2;
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_End - k2 + (offset - 1), MEM_ReadByte(oCNpc__Regenerate__Mana_CallChangeAttribute_End - k2 - 1));
end;
// push 1 -> push dword ptr [esi+1D0h]
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 0, 255); // 0xff
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 1, 182); // 0xb6
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 2, 208); // 0xd0
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 3, 1); // 0x01
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 4, 0); // 0x00
MEM_WriteByte(oCNpc__Regenerate__Life_CallChangeAttribute_Start + 5, 0); // 0x00
// push 1 -> push dword ptr [esi+1D4h]
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 0, 255); // 0xff
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 1, 182); // 0xb6
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 2, 212); // 0xd4
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 3, 1); // 0x01
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 4, 0); // 0x00
MEM_WriteByte(oCNpc__Regenerate__Mana_CallChangeAttribute_Start + 5, 0); // 0x00
It is a bit of sphaghetti, but it works. It has also not been playtested through a full playthrough, but I couldn't find a situation where anything weird or unwanted would occur. It does have two pitfalls: The frequency is global and it is not possible to regenerate slower than 1 hp/mana per frequency. This means there is a strange compromise to be made between how small regen/degens you want and how smooth you want them to be...
This is good enough for my needs though.
Thank you all, I learned a lot working on this and with your help.
Berechtigungen
- Neue Themen erstellen: Nein
- Themen beantworten: Nein
- Anhänge hochladen: Nein
- Beiträge bearbeiten: Nein
|
|