ENH: Added np.char.slice_ by madphysicist · Pull Request #20694

ENH: Added np.char.slice_ by madphysicist · Pull Request #20694 · numpy/numpy

For this PR, I can probably get rid of chunksize. Right now, with the updates to dtype, assuming a.dtype == np.dtype('S1'), we can get the main functionality of slice_ implemented without any copying as

 a[..., None].view('S1')[start:end].view(f'S{end-start}').squeeze()

(The squeeze is figurative). I rather like the idea of being able to sample arbitrary characters from the string, in forward, backward and even reverse order. I think we can still do that with just view and slicing. I can see how potentially overlapping chunks of arbitrary size may be a bit of a stretch (though I really like the idea), so the modifications to as_strided are not necessarily a prerequisite for this PR.

offset: number of bytes to add to current array's base address when viewing. This seems like a natural addition when working with strings. I actually got the idea from np.ndarray. The challenge I see here is dealing with subclasses, but it's my understanding that all arrays implement the buffer protocol (I may be very very wrong about that).

dtype: datatype with which to view elements. This one seems pretty straightforward. Given that you can completely screw up just about everything just using strides and shape, I don't see any reason not to trust a competent user to be able to change the dtype as well.

With these two changes, the snippet above could become

as_strided(a, offset=start * a.dtype.itemsize, dtype=f'S{end - start}')

In some ways, I find this easier to understand, since it only makes a single coherent transformation rather than multiple changes.