Let me try to clarify things (I know it is a bit confusing):
Yes, download to flash is free. This means that all flash loaders that we have
integrated in the J-Link software can be used with any J-Link or OEM product such as
SAM-ICE or Midaslink without requiring an extra license.
The supported devices are basically all popular ARM7/9 or Cortex-M3 / M0 microcontrollers
with built-in flash.
Which devices are supported?
You should either look it up in the manual or (better) check the drop-down list in the
J-Link control panel which lets you select the microcontroller.
Of course devices like AT91SAM7xxx, AT91SAM3xxx, LPC23xx, LPC24xx or STM32 are supported,
but also many others.
Can I use the free flash download in production ?
The idea we at Segger had was different. We basically made our flash loaders available because we felt
that a lot of times the flashloaders that come with a tool chain (in this case IAR) didnot work reliably
or not nearly as fast as our flash loaders. We felt this makes it easier to use a lot of tool chains (such as IAR, Keil,
but also GDB) with J-Links. So the plan was to use the J-Link flash download for development purposes;
in other words you'd use the debugger to load the program and then J-Link to program the flash.
Of course, this can also be used in production. However, if you do, you are basically responsible for telling J-Link
which device it is programming and for loading the program. There are multiple ways to do this:
As shown above, you can use GDB with a script (see above). You could also use J-Link commander (free, in the
software and documentation pack) to do the same thing, so you do not need to use GDB. This probably keeps it
a little simpler.
An other option is to buy the SDK (@398 Euros) and write a little application (typicall in "C") which does the same thing.
This is not a difficult task, especially since the source code of J-Link commander is part of the SDK.
If there is interest to find out how to program the flash using J-Link commander, let me know, we can
post a small sample script.
JTAG Isolation
In general, when you are using J-Link to program your micro in a production environment, we recommend you also
use the JTAG Isolator. This will protect J-Link against voltage spikes and different ground potentials, as well
as protect the PC used and your target hardware.
So what about J-Flash ?
J-Flash is a program which requires an extra license. It can also program external flash on basically any system,
which is something the flashloaders in the J-Link software can not (at least not now).
J-Flash is also used as setup-program for our stand-alone programmer, Flasher ARM.
So: To use J-Flash with a standard J-Link, you still need a license. With Flasher ARM or J-Link PRO, that license
is already included. If the flash download from the debugger or J-Link commander is all you need,
you do not need to anything else, no extra license.
So the choice is yours. Hope this helps to clarify it.
But I could always download from the IAR debugger into flash. What is the difference to the J-Link flash loaders ?
The flsh loaders in the J-Link software are optimized for both the target system and J-Link. They are very fast,
typicall much faster than the flash loaders that come with EWAR.
ANd they work very reliably and do not typically need any setup information (all we need to know is
which micro is used)
Try it out. All you have to do is typically to disable the flash loader of the debugger.
The J-Link flash loader should now automatically take over when you download program into the flash.
Why are the J-Link flash loaders so fast ?
Because we know what we are doing ... 
We take full advantage of the available RAM, avoid operations that are not required,
let the processor program as much as possible at once.
But we also typically check if a sector already contains the correct program
(something that happens quite frequently when modifying a program ... Sometimes
a lot of sectors remain unchanged when you do a small change in the program).
This, just like verifying, is done typically by a fast 32-bit CRC algorithm which runs in the
target controller.
An other reason is that typically we setup the PLL to let the target CPU run at high speed,
something that will also help to accelerate programming and CRC computation.
Hope this helps to clarify things ...
Rolf