Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It returns a new pointer, with the same buffer, as I said already. "There's a different length marked along with the pointer, but that's it."

You need a slice, which has a different length. That is how you do it, without a new allocation.

It's effectively:

    struct String {
      size_t length;
      char\* buffer
     };

    struct String\* Unbounded_Slice(struct String\* original, size_t Low, size_t High) {
      struct String\* slice = malloc(sizeof(struct String));

      // Bounds checking would go here...

      slice->buffer = original + Low;
      slice->length = High;
      return slice;
     }
(The exact definition of a string is implementation-defined. But that's the concept.)

Ada enforces safe ranges, which means you need to carry the length of the slice somehow. It does not use C's 0-terminated strings. So slicing does not work the same way as strtok or other self-modifying systems - the length isn't guessed, it's known.

But if you change one character in the buffer of the slice, it'll be changed in the original Unbounded_String too.

For trimming whitespace, you're right that Unbounded's standard Trim may reallocate. It carries multiple buffers, and when you Trim sometimes it will just hand it back, other times it'll reallocate. [0] Mostly for performance tradeoff. Keeping the original can make iteration slower, as it holds multiple buffers.

So, to implement our own - with one caveat. Slice can't handle 0-length, because range safety is enforced. So in the case of a wholly whitespace string, we'll be doing a whole new allocation.

    -- This line is just for pasting into godbolt
 pragma Source_File_Name (NTrim, Body_File_Name => "example.adb");

    with Ada.Strings.Unbounded;
     with Ada.Strings.Maps;
     use Ada.Strings.Unbounded;

    function NTrim(Source : Unbounded_String) return Unbounded_String is
        Len : constant Natural := Length(Source);
        First, Last : Natural;
        Whitespace : constant Ada.Strings.Maps.Character_Set := Ada.Strings.Maps.To_Set(" " & ASCII.HT & ASCII.LF & ASCII.CR);
     begin
        if Len = 0 then
           return Source;
        end if;

        First := 1;
        while First <= Len and then Ada.Strings.Maps.Is_In(Element(Source, First), Whitespace) loop
           First := First + 1;
        end loop;
    
        Last := Len;
        while Last >= First and then Ada.Strings.Maps.Is_In(Element(Source, Last), Whitespace) loop
           Last := Last - 1;
        end loop;
    
        if First > Last then
           return To_Unbounded_String("");
        end if;
    
        declare
           Trimmed_Length : constant Natural := Last - First + 1;
        begin
           if Trimmed_Length >= 3 then
              return Unbounded_Slice(Source, First, First + 2);
           else
              return Unbounded_Slice(Source, First, Last);
           end if;
        end;
     end NTrim;
The resulting compilation [1] has a few things. Our whitespace map gets allocated and deallocated most of the time. A map is harder to treat as a constant, and the compiler doesn't always optimise that nicely. Most of the code is bounds checking. No off-by-one allowed, here. Where first is greater than last, you get a new full allocation.

[0] https://github.com/gcc-mirror/gcc/blob/master/gcc/ada/libgna...

[1] https://godbolt.org/z/x8Erhqn5n



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: