RAM issue on specific variable v3.5.0.6 vs v4.0.4.8

Started by PicNoob, Feb 16, 2025, 09:25 PM

Previous topic - Next topic

PicNoob

Hey guys,

Another annoying issue. Using 18F26K80, I've identified a memory variable/location that is changing on its own in the new v4 compiler but isn't on the v3.5 compiler. No program changes between them just compiling the same program in each. The variable appears to be contaminated with a couple bits from something else and I can watch it changing in my interface which outputs all variables to serial.
 
In v4 it's assigned to:
fake_variable equ 0x73
fake_variableH equ 0x74

Not at the start of a bank or anything weird.

In v3.5 its assigned to a different location. But the variable at location 0x73/0x74 in the old compiler isn't changing either so I suspect something in the new compiler is over flowing over to 0x73/0x74

Normally I would just keep using v3.5 and move on at this point but this is an upcoming product and I have time to try to figure it out. And I'm wondering what else might be going wrong for me in v4. I still have a mystery EEPROM issue in some programs with v4 as well so I'm hoping maybe I find a common cause here.

Anyway assembly is not my forte. I've been crawling through looking at the ram address and the variable right in front of it that might be overflowing into it, with little success.

Has anyone ever come across an issue like this?

kcsl

Try and distil your program down to the bare minimum to reproduce the issue. You can then post the smaller program (or maybe arrange to send send it to Les) so it can be looked at.

Regards,
Joe
There's no room for optimism in software or hardware engineering.

trastikata

This is not an issue but normal behavior. Proton Compiler creates some internal system variables needed for some math operations or other commands. Between versions the mechanism used for those operations might get improved and changed, which could change the total amount of system variables required, thus the beginning RAM address of the user variables changes too.

As for the EEPROM issue it is not clear what it is, can't say much without more information.

top204

RAM positions are not stationary at compile time.

It depends on what the program is doing and what internal variables have been created and how many Bit variables or Access variables or Heap variables or System variables are created in the program listing.

Version 3.5 is now very old, and a lot has been added and changed in the compilers since then.

For example... Compiler system variables are created for the compiler's library subroutines, and comparisons and expressions etc, and these have changed over time. Also, Bit variables are always allocated to Access RAM within 18F devices, and low RAM in 14-bit core types, so the more Bit variables there are in a program listing, the user variables get moved up to allocate them. i.e.  Eight Bit variables per byte of RAM.

Also, if the new "Declare Auto_Variable_Bank_Cross" is used within a program listing, a few empty RAM bytes may appear because a multi-byte variable would have crossed a RAM bank, so its beginning address is moved up to the start of the next RAM bank. This saves both code space and time in a program because of the less frequent need for RAM bank changing assembler mnemonics.

Creating Bit variables in Access RAM on 18F devices, saves a huge amount of code space and time, because comparisons and loading/reading of the Bits does not need any RAM bank changes, or extra code to cater for a RAM bank change before the comparion and jump etc... Low RAM in the 14-bit core devices is the best it gets with these fragmented RAM devices, but it means the Bit variables are usually in the same RAM bank as each other, and the same RAM bank as compiler system variables, and some of the more commonly used SFRs, so it helps code space and time.

If you want to assign a specific area or RAM for a reason, Dim it as a standard variable or array, then use the AddressOf function to find out where it is located in RAM. Or assign it in a high RAM address that is known will not be effected by the compiler's RAM allocations.

PicNoob

These are all great points but not quite addressing my problem. I'm aware each compiler will map out its own ram addresses when compiling.

The issue is this program when compiled in v4 is getting a ram overwrite somehow at byte 0x73 that it does not get when compiled in v3.5. So I'm looking for ways to prove or disprove its a compiler issue or at least better isolate the cause. I happened to find this one byte being overflowed by chance and I'm concerned there are others I'm not noticing. I use around 1400 bytes of ram.

For what its worth I've been using Declare Auto_Variable_Bank_Cross = On which both v3.5 and v4 seem to accept. But this isn't near a ram boundary so I don't believe its a boundary issue.

I don't set any ram addresses manually (other than position 0 for the interrupt redirect toggle for the boot loader), and let the compiler run the addressing, so I'm assuming it should be impossible for a ram overwrite to occur. For example in the past even if I accidentally assign a word value to a byte, the compiler truncates one of the bytes or generates an error at compile time.

FWIW I've been working on other areas of the program making it a bit larger, and noticed the issue jumped over to ram byte 0x77 overflowing. So I can make it move around based on program size.



 

PicNoob

Quote from: kcsl on Feb 17, 2025, 07:56 AMTry and distil your program down to the bare minimum to reproduce the issue. You can then post the smaller program (or maybe arrange to send send it to Les) so it can be looked at.

Regards,
Joe


Yes this was my original approach, the problem is its a 32k program with 1300 ram variables, with only a fraction of those output to serial. I have not noticed any overwriting on smaller programs using the same compiler and processor. So now that I have a version of the program where I can recreate it I'm hoping for some tips or something to isolate what is going on in the assembly.

Not opposed to sending Tim the program but I was hoping to first find the specific problem in the assembly to point out, to make it easier for him.

kcsl

Just out of curiosity, are you using the HEAP keyword with any of your variables ?
There's no room for optimism in software or hardware engineering.

PicNoob

Quote from: kcsl on Feb 18, 2025, 07:17 AMJust out of curiosity, are you using the HEAP keyword with any of your variables ?

Not using any HEAP, although I have it declared just in case. These are the current declares.

QuoteDevice = 18F26K80       
        Xtal = 64             
        Declare Optimiser_Level = 1   
                                                     
        Declare Compiler_Start_Address = 0x800
        'Declare proton_start_address = 0x800
       
        Declare Reminders = Off
        Declare Stack_Size = 10       
        Symbol spi_delay = 20
       
        Declare Auto_Variable_Bank_Cross = On
        Declare Bootloader = On
        Declare Signed_Right_Shifts = Off
        Declare Auto_Heap_Arrays = On
       

PicNoob

I don't think its related to my problem but out of curiosity, I noticed my manually addressed variable appears to have the same ram spot as a compiler system variable? Am I interpreting that properly?

From the ASM:

; ADDRESSED VARIABLES
bootloader_active equ 0x00

; COMPILER SYSTEM VARIABLES
BPF equ 0x00
BPFH equ 0x01

Stephen Moss

Quote from: PicNoob on Feb 19, 2025, 03:18 AMI don't think its related to my problem but out of curiosity, I noticed my manually addressed variable appears to have the same ram spot as a compiler system variable? Am I interpreting that properly?

From the ASM:

; ADDRESSED VARIABLES
bootloader_active equ 0x00

; COMPILER SYSTEM VARIABLES
BPF equ 0x00
BPFH equ 0x01

May not be related, but are you using the ORG command to try and place the program code above the boot loader? If so perhaps that is the issue as the manual states....
QuoteOrg is actually an assembler directive, so the compiler has no control over it, and if using procedures, it will cause assembler errors, or the program not to work correctly. Only use the Org
directive if you know what the underlying assembler code, that the compiler creates, is doing.

trastikata

Without seeing and testing the actual part of the code, causing the issue, it would be just blind guessing.

top204

If your program is using the "unofficial" 'At" directive, the standard Dimed variables will not move around the variables placed using "At'. They will just be created at the RAM address used as the address parameter after 'At', if they are assigned to it by the compiler.

The 'At' directive is not an official part of the compiler and on recent compiler versions I did have a warning, stating it was not an actual part of the compiler's official syntax, when it was used, but removed it because I thought users would know that a variable placed at a particular area in RAM is of their own doing, and they must be careful to check that standard variables were not using that same RAM area as well.

So I will re-introduce the warning message.

I placed the 'At' directive purely for my tests and any libraries I created when I was writing the compilers, so I had more control of RAM and could check for bank boundaries etc... In order to get the standard variables to move around assigned address variables would require a, virtual, re-writing of the RAM handling mechanisms of the compilers, which is not something I will do lightly, because the slightest mistake or forgetfulness of the mass of inter-connecting code and mechanisms I wrote 20 years ago, would cause a catastrophe in programs, for very little reason.

On an 8-bit device, you cannot assign a variable 'At' address $00, and never could. Lower RAM and Access RAM is purely for the compiler's use, and will place its system variables in that location. The compiler will not complain that a variable has been placed there, but it will be over-written.

PicNoob

#12
QuoteMay not be related, but are you using the ORG command to try and place the program code above the boot loader? If so perhaps that is the issue as the manual states....
QuoteOrg is actually an assembler directive, so the compiler has no control over it, and if using procedures, it will cause assembler errors, or the program not to work correctly. Only use the Org
directive if you know what the underlying assembler code, that the compiler creates, is doing.

Yes, I am! I've done it this way for around 15 years. I never considered it might be an issue so I'll research that. But I don't believe I use any "procedures". 

I do wonder if placing the boot loader above the code would make more sense... Anyone have a link to example of how that is done exactly? 

PicNoob

#13
Quote from: top204 on Feb 19, 2025, 11:17 AMIf your program is using the "unofficial" 'At" directive, the standard Dimed variables will not move around the variables placed using "At'. They will just be created at the RAM address used as the address parameter after 'At', if they are assigned to it by the compiler.

The 'At' directive is not an official part of the compiler and on recent compiler versions I did have a warning, stating it was not an actual part of the compiler's official syntax, when it was used, but removed it because I thought users would know that a variable placed at a particular area in RAM is of their own doing, and they must be careful to check that standard variables were not using that same RAM area as well.

So I will re-introduce the warning message.

I placed the 'At' directive purely for my tests and any libraries I created when I was writing the compilers, so I had more control of RAM and could check for bank boundaries etc... In order to get the standard variables to move around assigned address variables would require a, virtual, re-writing of the RAM handling mechanisms of the compilers, which is not something I will do lightly, because the slightest mistake or forgetfulness of the mass of inter-connecting code and mechanisms I wrote 20 years ago, would cause a catastrophe in programs, for very little reason.

On an 8-bit device, you cannot assign a variable 'At' address $00, and never could. Lower RAM and Access RAM is purely for the compiler's use, and will place its system variables in that location. The compiler will not complain that a variable has been placed there, but it will be over-written.

Interesting! OK I'll move that variable location to the end of RAM to eliminate it as a possible issue. I blame @trastikata for that idea. ;) Somehow, it does seem to be working though. That byte is used to repoint the interrupt function from the boot loader to the main program and interrupts are working. So likely the system variable at 0x00 is simply never updated or used in my program.

top204

If that address of RAM is used only by the bootloader, it will not matter if it is over-written.

All bootloaders use chunks of RAM that is also used by the program they are loading in, because the program will re-fresh the RAM it is using, and so will a good written bootloader. Remember, a variable is just a piece of RAM and the variable name is just a better method of remembering an address. So it can be shared, as long as the RAM is not used for long term storage between sections of a program.

Regards
Les

trastikata

I still don't get what is the issue we are commenting here?

If I understand correctly, you observed that when compiled with v3.5, variables take certain addresses and when v4 is used they do take different addresses? As we said and explained, this is normal behaviour.

If I misunderstood something, please elaborate what exactly is the problem.

PicNoob

#16
Quote from: top204 on Feb 20, 2025, 08:53 AMIf that address of RAM is used only by the bootloader, it will not matter if it is over-written.

All bootloaders use chunks of RAM that is also used by the program they are loading in, because the program will re-fresh the RAM it is using, and so will a good written bootloader. Remember, a variable is just a piece of RAM and the variable name is just a better method of remembering an address. So it can be shared, as long as the RAM is not used for long term storage between sections of a program.

Regards
Les

Thanks. It's actually a common variable between both the bootloader and main program to redirect the interrupt vector, so that interrupts function in both. It was definitely a potential issue but doesn't resolve or explain the ram overwrite I get with the same program compiled in v4 vs v3.5.

Code in bootloader:
Quote'****************************************************************       
'* Interrupt Service Routine Inside Bootloader
'****************************************************************
HIGH_INTERRUPT_ROUTINE:
Context Save

    If bootloader_active = 1 Then
        GoTo vector_remap
    EndIf
   
    While RCIF = 1 
        serial_in[serial_in_count] = RCREG
       
        Inc serial_in_count
        If serial_in_count > 74 Then
            serial_in_count = 0
        EndIf
    Wend
   
    While RCIF2 = 1 
        serial_in[serial_in_count] = RCREG2
       
        Inc serial_in_count
        If serial_in_count > 74 Then
            serial_in_count = 0
        EndIf
    Wend
   
Context Restore                                                                                                                           


'*****************************************************************************

Org 0xC04
vector_remap:
 

PicNoob

#17
Quote from: trastikata on Feb 20, 2025, 09:17 AMI still don't get what is the issue we are commenting here?

If I understand correctly, you observed that when compiled with v3.5, variables take certain addresses and when v4 is used they do take different addresses? As we said and explained, this is normal behaviour.

If I misunderstood something, please elaborate what exactly is the problem.

This discussion went a bit off track because they found and commented on an unrelated issue. In my bootloader I was hard coding ram address 0x00 for the vector redirect, but I forgot the first 100 bytes or ram or so are reserved for the system. Easy fix I just moved it to the last byte in ram.

But the search for my problem at hand continues. When program is compiled in v4 some ram locations are corrupted/overwritten. But when compiled in v3.5 this does not happen.

One interesting theory is a possible difference or issue with how "Declare Compiler_Start_Address = 0xC00" is handled. I'll do some additional testing with that.

I've wanted to place the boot loader at the end of the program for a test but I've never done that before. Do you have any sample code for how that is usually done?

I guess main program starts and then has a bootloader redirect? How are interrupts handled in bootloader if main program is wiped out?

trastikata

#18
Quote from: PicNoob on Feb 20, 2025, 05:06 PMWhen program is compiled in v4 some ram locations are corrupted/overwritten. But when compiled in v3.5 this does not happen.

What do you mean by RAM locations get "corrupted/overwritten"? Maybe you meant FLASH memory locations? - then it will make sense?

I can only guess from your posts that you want to say that the bootloader program is larger when compiled in v4.0 compared to v3.5 and gets overwritten/corrupted by the main program because the start address of the main program is not high enough, or the main program gets corrupted because the bootloader area is protected, but in this case your bootloader, if designed properly, should have complained that booloader area is overwritten?

Anyway if this is the case, then in your bootloader adjust the main program starting address to higher offset location, and use that same location as Declare for the start in the main program.

PicNoob

#19
Quote from: trastikata on Feb 20, 2025, 05:44 PM
Quote from: PicNoob on Feb 20, 2025, 05:06 PMWhen program is compiled in v4 some ram locations are corrupted/overwritten. But when compiled in v3.5 this does not happen.

What do you mean by RAM locations get "corrupted/overwritten"? Maybe you meant FLASH memory locations? - then it will make sense?

I can only guess from your posts that you want to say that the bootloader program is larger when compiled in v4.0 compared to v3.5 and gets overwritten/corrupted by the main program because the start address of the main program is not high enough, or the main program gets corrupted because the bootloader area is protected, but in this case your bootloader, if designed properly, should have complained that booloader area is overwritten?

Anyway if this is the case, then in your bootloader adjust the main program starting address to higher offset location, and use that same location as Declare for the start in the main program.

I wish the problem was as trivial as bootloader being a different size, etc. The bootloader itself I have not been changing and leaving it in v4, although I did try compiling it back in v3.5 and it had no impact on the issue.

The symptom is that ram byte at location 0x73 (sometimes based on program size it moves up or down in location) is changing when no assignment is made to them, when compiled in v4. The change happens when other activities in program execute and other unrelated ram bytes are changed. When compiled in v3.5 with zero program changes no such issue occurs. 

I believe the compiler is overflowing or miswriting to this ram address for reasons unknown, and I'm trying to find something in assembly file that explains the bug/issue. Like the equivalent of assigning an int16 to an int8 address. But I checked near by variables based on ram address and even if those are never modified this problem comes up.