PDA

View Full Version : Semi-Colons in HTML - for e-books


Captcha
09-06-2011, 01:58 AM
I'm learning to format e-books, and the set of instructions that I've been following warns against the number-based codes in favour of the letter based codes. eg. It suggests using “ instead of “ for left double quotation marks.

My problem is that I can't find the letter-based code for a semi-colon. Is there such a code? If I can't find one, is it okay to mix the number based code in with the letter-based codes, or are the letters really better?

ETA: Apparently AWWC prefers the number based codes, because it replaced my number code for the quotation marks with the actual quotation marks. But you get the idea, right?

Thanks for any help.

Medievalist
09-06-2011, 03:23 AM
You only have to use entities or unicode for diacritics, glyphs, and characters that are not represented in ASCII.

A semicolon is in ASCII, just as "straight," non-curly, non "printers' quotes" are.

Just type a semicolon.

Captcha
09-06-2011, 03:45 AM
Well, that's deceptively simple. Everything else has been annoyingly complicated...

Damn, am I ever a pessimist. I totally trust your expertise, and yet... I am simultaneously confident in my own ability to screw this up!

But, okay. I'm not replacing the bastards. Semicolons are in ASCII (as if I know what that means!).

Thanks.

Allen R. Brady
09-06-2011, 03:59 AM
Semicolons are in ASCII (as if I know what that means!).

In a nutshell, it means it's a standard character that will be available in any font set, just as a number or letter would be.

Medievalist
09-06-2011, 04:04 AM
But, okay. I'm not replacing the bastards. Semicolons are in ASCII (as if I know what that means!).

Thanks.

ASCII is The American Standard Code for Information Interchange.

It goes back to the dawn of computers. It means, in crude terms, the basic characters that are on an English typewriter are universal. Each of the 26 letters of the English/Roman alphabet, without diacritics or accent marks, the digits from 0-9, the basic punctuation marks, and a handful of other symbols are pretty much universally agreed on in terms of how they are encoded, represented and displayed.

You get 'em for free :D They can be simply typed and should diplay properly.

Captcha
09-06-2011, 04:07 AM
Excellent. I will add ASCII to my store of knowledge.

If I keep this 'learning' thing going, I could become technologically competent, and once that happens... look out!

Xelebes
09-06-2011, 04:10 AM
The 27th and 28th letters are the ampersand and the amphere?

Medievalist
09-06-2011, 05:03 AM
The 27th and 28th letters are the ampersand and the amphere?

They're a typo because my allergies are very very bad today, which means my eyes are swollen and I can't read . . . .

fixed.

Here's a fairly standard ASCII chart showing the ASCII number, the hex, and the binary.

http://www.ascii-code.com/

Xelebes
09-06-2011, 05:27 AM
They're a typo because my allergies are very very bad today, which means my eyes are swollen and I can't read . . . .

fixed.

Here's a fairly standard ASCII chart showing the ASCII number, the hex, and the binary.

http://www.ascii-code.com/

Supposedly one dodgy source says that the ampersand was the 27th letter, so it must be true!

(It might be, but I can't trust the source from where I got it.)

Medievalist
09-06-2011, 06:09 AM
Supposedly one dodgy source says that the ampersand was the 27th letter, so it must be true!

(It might be, but I can't trust the source from where I got it.)

Yeah, that's just more wronger than my typo [sic]

The letters don't start until character 65/# 65.

In the early days every thing was a character, so spaces, end-of-line, print feed, bell, everything. A large chunk of ASCII is devoted to these "control characters," and they're still embedded in modern documents today.

Take a look at this chart (http://www.ascii-code.com/).

Mac H.
09-06-2011, 06:47 AM
Technically ASCII is only 7 bits - from 0 to 127.

From 128 to 255 is often called 'extended ASCII' but isn't actually ASCII.

The link everyone is giving 'www.ascii-code.com' is actually wrong - it claims that it is giving the IEC 8859-1 (http://en.wikipedia.org/wiki/ISO/IEC_8859-1) definition of the upper range.

However it isn't .. it looks like it is Windows-1252 .. which is a bit different : http://en.wikipedia.org/wiki/Windows-1252

It's a common mistake and most web-browsers cope - for example they will render "(&)(#)128;" (or "(&)(#)x80;") as the euro symbol even though it isn't correct -- it is actually "(&)(#)8364" (or "(&)(#)x20AC;")

See how they look for your browser:
(&)(#)128 : €
(&)(#)8364: €
Most browsers will cope. But it isn't actually right.

I know - nobody cares about this. But that table referenced wasn't actually 'extended ASCII' nor 'IEC 8859-1' !

If you don't believe me - look at this:
* http://www.fileformat.info/info/unicode/char/0080 <-- Nothing ..
* http://www.fileformat.info/info/unicode/char/20AC <-- Euro

Mac

Steven_Lake
09-13-2011, 05:10 AM
Captcha (http://absolutewrite.com/forums/member.php?u=38055): Silly question, but what Ebook format are you trying to build your files for? It probably won't matter formatting wise, but sometimes it does. I work with Epub exclusively since I'm a big FOSS fan and don't like proprietary formats. But that's a whole other discussion. Anywho, lol, knowing that could possibly help with the discussion. :)

Captcha
09-13-2011, 05:18 AM
I did PDF, MS Reader, .mobi and Epub. But I was doing the html stuff in an editor, and then doing the different formats in Calibre.

As far as I can tell, it worked... http://www.allromanceebooks.com/product-taken-599945-145.html

Steven_Lake
09-13-2011, 07:43 AM
Awesome! :D