Usual digital Audio/Video storage and playback is based on high
data compression. With diverse algorythms usual package ratios are in
range 6 (Mjpeg) to couple hunderts - H264 for instance.
So is possible to pack some 2 hours movie in about 1GB file, with res.
1280x580 or like.
But such sophisticated packing methods requre lot of CPU power by
playback. 2GHz Pentium4 range power for instance.
Mpeg1, first really
good commercial packing, used by VCD required something in range P1 at
100MHz. Before it we could see diverse less efficient and less CPU
hungry codecs (codec is actually packing/depacking system), good for
30MHz and similar fast CPUs.
But we have only 8MHz, 16bit 68000 in STE. If want 320x200 px
with at least 12.5 fps, we can immediately forget any depacking during
playback. Even with some very primitive algorythm, 320x160px, at 12.5
fps needs over 500KB/sec depacking rate. It is just not possible. And
if there is hi-color show, we have even less CPU time, so completely
Only thing, what we can do is: using simple loading of depacked
and audio data, straight to end destinations - avoiding any
intermediate data copy, processing. And even it self is almost too
demanding for old Atari STE. Usual disk transfer rates, with modern
drives, flash cards are around 1MB/sec. Someone could say: it is much
more than 500KB/sec, so let use my UltraSatan, which can 1170KB/sec .
It is possible to use almost whole mentioned datarate, but not with
hi-color playback, instead with 16 color playback, what is already
solved - with nice 25 fps and 320x200 px.
Hi-color showing on Atari STE occupies CPU pretty much during active
scanlines - and it is some 65% of time by res. of 320x200. OK, then
let's use DMA for loading data - will someone say. Unfortunately, this
is what works not, and it is main reason why UltraSatan is not really
good for this: DMA, when is active stops CPU for some cycles - and
because whole hi-color displaying is based on accurate CPU timing, it
will be destroyed, and you will see garbage instead nice video.
Only way with UltraSatan is to load during border periods - top and
bottom border, what means only some 33% of time by normal res. This is
why I reduced vertical resolution to 158px - then need to load less
data, while there is more time available - about 48% . Problem solved -
actually, only one. But there is many thing yet to care, and we need
some unusual solutions.
If you try usual data loading with GEMDOS Trap #1 calls,
it will fail
miserably, even with fastest driver SW.
The reason is that we need short data loads in little available blank
time periods, what is about 9.5 mS. Then need to load 21 sectors from
drive/Flash card. And with speed above 1100KB/sec, it is possible to
load 21 sectors in 9.5mS. But not via GEMDOS. Because data on AHDI
partitions is organised different. We have so called logical sectors,
which are multiple sizes of normal, 512 byte long hard disk sectors.
Atari made big sectors for so called BGM (Big GEM) partititons - with
sizes over 32MB, up to 512MB. By 512MB partition, one logical sector is
8KB, or 16 normal sectors long. And it means that it is minimal size
with what hard disk driver SW operates. Now, what happens when we give
command to GEMDOS that load 21x512=10752 bytes ?
It will calculate how many logical sectors it takes, and will load so
much that complete requested data loads. In case of 512MB partititon it
is 2 logical sectors, or 16384 bytes. Hey ! - it can overwrite some
user data, because we asked less. Right - TOS programmers knew about
it, and therefore we have disk buffers. So, in reality, data goes first
in buffer (from disk), and then will copy only proper number of bytes
to end destination, to avoid damage of user data. And it means of
course slower load. With 16 color playback, I overrided problem by
loading always 16KB long blocks - then TOS is smart enough to load
straight to dest.
Here, we can not go on 16KB blocks, because are limited
with time for
loading. So, only way is to use direct disk access, bypassing GEMDOS
calls. With experience on writing hard disk drivers, was not problem
for me. Then can load always exactly so much sectors as much we need,
straight to dest.
However, this makes new problem for us - how to locate position of
large, AV file on drive ? Then, fragmentation. Fragmentation is not an
option here - it must be avoided, otherwise palyback will be bad,
because can not achieve required loading speed. So, defragmenting is a
must - what is good overall too, for work with computer.
After some thinking, and initial overcomplicated and slow ideas, I
solved finding file location on drive in pretty fast and relative
simple way - code is short in any case. Who is interested, may look
about in source of player SW.
So much about hi-color playback with UltraSatan. I don't expect any
improvements here - more fps and/or resolution is just not possible.
Actually, for many people even this will not work well.
Hi-color playback using cartridge port IDE adapter CATA :
ST(E) cartridge port is 16-bit, like IDE (ATA) hard disks, or
Flash cards. So, idea of using it for IDE hard disk adapters is normal,
and there was already manufactured one: Paskud.
I made something faster, using very special way of writing to disk -
what is always solved tricky with cart. port, because it is read-only
design. Without going in too deep details, I'll focus only on things
related to speed.
If we want really good quality AV playback, need pretty high loading
rates. 320x200px, with 80 colors/line and 30K colors needs 2.5MB/sec
loading rate. Or Overscan 416x228px with 48 colors/line - 2.4MB/sec.
Such speeds are higher than any existing hard disk adapter can on STE.
Max is about 1800KB/sec with ICD Link2. My special ACSI-CF adapter can
some 1900KB/sec - and it is really top with ACSI port.
Then how to load 2.5 MB/sec ? To undertanding solution, we need first
to know little about Atari ST(E) RAM, bus speeds. RAM in ST(E) is 250nS
cycled and 16 bit wide. It means that max data rate is 8 MB/sec. Looks
promising .. But, RAM holds video data too, what is constantly readen
by displaying screen. It uses exactly half of RAM bandwith - 4MB/sec
(need less, but logic is made so, that in blanks you still can not use
whole RAM bandwith). Anyway, still 4MB/sec. So, why then only less than
2MB/sec ? CPU has constant access to RAM. It's RAM access cycle is
500nS, so CPU can load RAM with max 4MB/sec (in peak). DMA chip is
designed for max 2MB/sec speed - then slowdowns CPU about 50%. If DMA
would go on 4MB/sec, CPU should be stopped completely. Anyway - as we
told earlier, DMA is not good for hi-color palyback. And even with
4MB/sec, we could not go much over 1MB/sec with some bigger res.
CPU can 4MB/sec, as is told. Yeah, but usual way of data transfer is:
first load data from source (adress) into CPU, and then write to dest
address. It means that max transfer rate is actually only 2MB/sec, in
peak. It would be good if CPU had instruction to write from some
external port to some RAM address, directly. I call it semi-DMA. But no
such by 68000. And likely no by other CPUs.
Still, we can achieve it, with little hack in machine, proper logic in
cart. port adapter and special SW. Need to set machine in special
state: all interrupts disabled, no DMA activity (but audio DMA can go
on, luckily) . Then logic of cart. port adapter will invert R/W line
from CPU to MMU/Shifter at any reading from RAM, or shifter - here
meaning such command given to CPU. Because of inverting, data will be
not readen, but written there - what is what we want when reading from
hard disk. And from where data will be readen ? From IDE port, which
will be activated parallel with inverting R/W line. Any of such cycle
will advance IDE internal counter by 2 bytes, so will load data
sequencially. This is the essence. And we can achieve peak speed of
some 3.4MB/sec. Not exactly 4MB, and one of the reasons is CPU bug with
Following is for people knowing little about 68000 coding:
For reversed R/W loading from disk, we may use movem as fastest way.
Then something like :
movem.w (a6),d0-d7 will load 18 bytes into RAM at address in a6 .
Pardon ? 8x2 is 16, not 18, says someone . Right - but there is bug in
CPU, and it always performs one cycle more than needed. Normally, when
it is read and not write those 2 bytes are just lost, but in reversed
mode it will be written into RAM, and luckily to correct place. And bug
is the reason why we can not use movem.l ... to transfer multiple of 4
bytes - it will be always 2 more bytes than command says.
SW for transfer must be executed from cart. port ROM - then logic can
simply determine is it code fetch, or RAM access . Parameters should
never be readen from RAM - it triggers IDE port. And interesting is
that something like dbf (dbra) causes bad data transfer. I think that
the reason is that dbf (and bra instructions) have cycle counts not
divideable with 4, what confuses MMU logic. So, branches only with jmp
- but no real need for them - we need only few rutines to be placed in
Still not enough fast ! - over 3MB/sec, and not enough ? Yes -
calculate little: for 320x200px and 25fps we need to load 32KB bitmap +
about 20KB color data + sound data in 1/25 sec. In 33% of time.
50x25=1250 KB/sec x 3 is more than our tricky cart. port reading can.
The solution is in reading color data not into RAM, but straight into
shifter - the whole concept is based on straight loading to end
destination. And it is possible with carefully written code and newer,
fast compact Flash cards - to have syncro load right from IDE port. And
not just color load of usual PCS format - 48colors/line. but more: even
80 colors/line. Looks better. Furthermore, because we show same bitmap
twice - 25 fps and 50 vertical scans, we can use 2 slightly different
color data to achieve more perceptual color nuances - about 30000. It
needs 2 different color datas, and it makes 2.5MB/sec rate.
By Overscan, we must load even part of bitmap data interleaved with
color data, because there is even less time in border periods. Then can
not have more than 48 colors/line, but it may look still good.
Dec. 13 2012. P. Putnik