(The exact definition of a string is implementation-defined. But that's the concept.)
Ada enforces safe ranges, which means you need to carry the length of the slice somehow. It does not use C's 0-terminated strings. So slicing does not work the same way as strtok or other self-modifying systems - the length isn't guessed, it's known.
But if you change one character in the buffer of the slice, it'll be changed in the original Unbounded_String too.
For trimming whitespace, you're right that Unbounded's standard Trim may reallocate. It carries multiple buffers, and when you Trim sometimes it will just hand it back, other times it'll reallocate. [0] Mostly for performance tradeoff. Keeping the original can make iteration slower, as it holds multiple buffers.
So, to implement our own - with one caveat. Slice can't handle 0-length, because range safety is enforced. So in the case of a wholly whitespace string, we'll be doing a whole new allocation.
-- This line is just for pasting into godbolt
pragma Source_File_Name (NTrim, Body_File_Name => "example.adb");
with Ada.Strings.Unbounded;
with Ada.Strings.Maps;
use Ada.Strings.Unbounded;
function NTrim(Source : Unbounded_String) return Unbounded_String is
Len : constant Natural := Length(Source);
First, Last : Natural;
Whitespace : constant Ada.Strings.Maps.Character_Set := Ada.Strings.Maps.To_Set(" " & ASCII.HT & ASCII.LF & ASCII.CR);
begin
if Len = 0 then
return Source;
end if;
First := 1;
while First <= Len and then Ada.Strings.Maps.Is_In(Element(Source, First), Whitespace) loop
First := First + 1;
end loop;
Last := Len;
while Last >= First and then Ada.Strings.Maps.Is_In(Element(Source, Last), Whitespace) loop
Last := Last - 1;
end loop;
if First > Last then
return To_Unbounded_String("");
end if;
declare
Trimmed_Length : constant Natural := Last - First + 1;
begin
if Trimmed_Length >= 3 then
return Unbounded_Slice(Source, First, First + 2);
else
return Unbounded_Slice(Source, First, Last);
end if;
end;
end NTrim;
The resulting compilation [1] has a few things. Our whitespace map gets allocated and deallocated most of the time. A map is harder to treat as a constant, and the compiler doesn't always optimise that nicely. Most of the code is bounds checking. No off-by-one allowed, here. Where first is greater than last, you get a new full allocation.
You need a slice, which has a different length. That is how you do it, without a new allocation.
It's effectively:
(The exact definition of a string is implementation-defined. But that's the concept.)Ada enforces safe ranges, which means you need to carry the length of the slice somehow. It does not use C's 0-terminated strings. So slicing does not work the same way as strtok or other self-modifying systems - the length isn't guessed, it's known.
But if you change one character in the buffer of the slice, it'll be changed in the original Unbounded_String too.
For trimming whitespace, you're right that Unbounded's standard Trim may reallocate. It carries multiple buffers, and when you Trim sometimes it will just hand it back, other times it'll reallocate. [0] Mostly for performance tradeoff. Keeping the original can make iteration slower, as it holds multiple buffers.
So, to implement our own - with one caveat. Slice can't handle 0-length, because range safety is enforced. So in the case of a wholly whitespace string, we'll be doing a whole new allocation.
The resulting compilation [1] has a few things. Our whitespace map gets allocated and deallocated most of the time. A map is harder to treat as a constant, and the compiler doesn't always optimise that nicely. Most of the code is bounds checking. No off-by-one allowed, here. Where first is greater than last, you get a new full allocation.[0] https://github.com/gcc-mirror/gcc/blob/master/gcc/ada/libgna...
[1] https://godbolt.org/z/x8Erhqn5n