问题描述:

I doubt if this is encryption but I can't find a better phrase. I need to pass a long query string like this:

http://test.com/test.php?key=[some_very_loooooooooooooooooooooooong_query_string]

The query string contains NO sensitive information so I'm not really concerned about security in this case. It's just...well, too long and ugly. Is there a library function that can let me encode/encrypt/compress the query string into something similar to the result of a md5() (similar as in, always a 32 character string), but decode/decrypt/decompress-able?

网友答案:

The basic premise is very difficult. Transporting any value in the URL means you're restricted to a subset of ASCII characters. Using any sort of compression like gzcompress would reduce the size of the string, but result in a binary blob. That binary blob can't be transported in the URL though, since it would produce invalid characters. To transport that binary blob using a subset of ASCII you need to encode it in some way and turn it into ASCII characters.

So, you'd turn ASCII characters into something else which you'd then turn into ASCII characters.

But actually, most of the time the ASCII characters you start out with are already the optimal length. Here a quick test:

$str = 'Hello I am a very very very very long search string';
echo $str . "\n";
echo base64_encode(gzcompress($str, 9)) . "\n";
echo bin2hex(gzcompress($str, 9)) . "\n";
echo urlencode(gzcompress($str, 9)) . "\n";

Hello I am a very very very very long search string
eNrzSM3JyVfwVEjMVUhUKEstqkQncvLz0hWKUxOLkjMUikuKMvPSAc+AEoI=
78daf348cdc9c957f05448cc554854284b2daa442772f2f3d2158a53138b9233148a4b8a32f3d201cf801282
x%DA%F3H%CD%C9%C9W%F0TH%CCUHT%28K-%AAD%27r%F2%F3%D2%15%8AS%13%8B%923%14%8AK%8A2%F3%D2%01%CF%80%12%82

As you can see, the original string is the shortest. Among the encoded compressions, base64 is the shortest since it uses the largest alphabet to represent the binary data. It's still longer than the original though.

For some very specific combination of characters with some very specific compression algorithm that compresses to ASCII representable data it may be possible to achieve some compression, but that's rather theoretical. Update: Actually, that sounds too negative. The thing is you need to figure out if compression makes sense for your use case. Different data compresses differently and different encoding algorithms work differently. Also, longer strings may achieve a better compression ratio. There's probably a sweet spot somewhere where some compression can be achieved. You need to figure out if you're in that sweet spot most of the time or not.

Something like md5 is unsuitable since md5 is a hash, which means it's non-reversible. You can't get the original value back from it.

I'm afraid you can only send the parameter via POST, if it doesn't work in the URL.

网友答案:

You could try a combination of gzdeflate (raw deflate format) to compress your data and base64_encode to use only those characters that are allowed without Percent-encoding (additionally exchange the characters + and / by - and _):

$output = rtrim(strtr(base64_encode(gzdeflate($input, 9)), '+/', '-_'), '=');

And the reverse:

$output = gzinflate(base64_decode(strtr($input, '-_', '+/')));

Here is an example:

$input = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

// percent-encoding on plain text
var_dump(urlencode($input));

// deflated input
$output = rtrim(strtr(base64_encode(gzdeflate($input, 9)), '+/', '-_'), '=');
var_dump($output);

The savings in this case is about 23%. But the actual efficiency of this compression precedure depends on the data you are using.

网友答案:

This works great for me:

$out = urlencode(base64_encode(gzcompress($in)));

Saves a lot.

$in = 'Hello I am a very very very very long search string' // (51)
$out = 64

$in = 500
$out = 328

$in = 1000
$out = 342

$in = 1500
$out = 352

So the longer the string, the better compression. The compression parameter, doesn't seem to have any effect.

网友答案:

Update:
gzcompress() won't help you. For example if you take Pekka's answer:

String length: 640
Compressed string length: 375
URL encoded string length: 925
(with base64_encode, it is only 500 characters ;) )

So this way (passing the data via the URL) is probably not the best way...

If you don't exceed the URLs limits with the string, why do you care about how the string looks like? I assume it gets created, sent and processed automatically anyway, doesn't it?

But if you want to use it as e.g. some kind of confirmation link in an email, you have to think about something short and easy to type for the user anyway. You could, e.g. store all the needed data in a database and create some kind of token.


Maybe gzcompress() can help you. But this will result in not allowed characters, so you will have to use urlencode() too (which makes the string longer and ugly again ;) ).

网友答案:

Basically, it's like they say: Compress text, and send it coded in a usefully way. But:

1) Common compression methods are heavier than text because of dictionaries. If the data is always an undetermined order of determined chunks of data (like in a text are words or syllabes[3], and numbers and some symbols) you could use always the same static dictionary, and don't send it (don't paste it on the URL). Then you can save the space of the dictionary.

1.a) If you are already sending the language (or if it's always the same), you could generate a dictionary per lang.

1.b) Take advantage of the format limitations. If you known it's a number, you can code it directly (see 3). If you known it's a date, you could coded as Unix-time[1] (seconds since 01/01/1970), so "21/05/2013 23:45:18" turns "519C070E" (hex); if it's a date of the year, you could coded as days since new year including 29/02 (25/08 would be 237).

1.3) You known emails has to follow certain rules, and usually are from the same few servers (gmail, yahoo, etc.) You could take advantages of that to compress it with your own simple method:

[email protected],[email protected],[email protected] => samplemail1:1,samplemail2:5,[email protected]:1

2) If the data follows patterns, you can use that to help compression. For example, if always follows this patter:

name=[TEXT 1]&phone=[PHONE]&mail=[MAIL]&desc=[TEXT 2]&create=[DATE 1]&modified=[DATE 2]&first=[NUMBER 1]&last=[NUMBER 2]

You could: 2.a) Ignore the similar text, and compress just the variable text. Like:

[TEXT1]|[PHONE]|[MAIL]|[TEXT 2]|[DATE 1]|[DATE 2]|[NUMBER 1][NUMBER 2]

2.b) Encode or compress data by type (encode numbers using base64[2] or similar). Like at 1). This allows you even to supress separators. Like:

[DATE 1][DATE 2][NUMBER 1][NUMBER 2][PHONE][MAIL]|[TEXT 1]|[TEXT 2]

3) Coding:

3.a) While it is true that if we compress coding with characters not supported by HTTP, they will be transformed into a more heavy ones (like 'año' => 'a%C3%B1o'), that can still be useful. Maybe you wanna compress it for store it at a Unicode or binary database, or to pasteit at web sites (Facebook, Twitter, etc.).

3.b) Although Base64[2] it is a good method, you can squeeze more at the expense of speed (as you use user functions instead of compiled ones).

At least with Javascript's function encodeURI(), you can use any of these 80 characters at parameter value without suffering modifications:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.:,;+*-_/()[email protected]?~'

So, we can buil our one "Base 80" (d)encode functions.

  • [1] http://wpdia.com.ar/n3paH "Unix time"
  • [2] http://wpdia.com.ar/no.9 "Base 64"
  • [1] http://scholar.google.com.ar/scholar?hl=en&q=syllabic+lossless+compression "Syllabic lossless compression at Google Scholar"
网友答案:

Not really an answer, but a comparison of various methods suggested here.

Used answers by @Gumbo and @deceze to get length comparison for a fairly long string I am using in a GET.

<?php
    $test_str="33036,33037,33038,38780,38772,37671,36531,38360,39173,38676,37888,36828,39176,39196,37321,36840,38519,37946,36543,39287,38989,38976,36804,38880,38922,38292,38507,38893,38993,39035,37880,38897,38378,36880,38492,38910,36868,38196,38750,37938,39268,38209,36856,36767,37936,36805,39248,36777,39027,39056,38987,38779,38919,38771,36851,38675,37887,38246,38791,38783,38661,37899,36846,36834,39263,37928,36822,37947,38992,38516,39177,38904,38896,37320,39217,37879,38293,38511,38774,37670,38185,37927,37939,38286,38298,38977,37891,38881,38197,38457,36962,39171,36760,36748,39249,39231,39191,36951,36963,36755,38769,38891,38654,38792,36863,36875,36956,36968,38978,38299,36743,36753,37896,38926,39270,38372,37948,39250,38763,38190,38678,36761,37925,36776,36844,37323,38781,38744,38321,38202,38793,38510,38288,36816,38384,37906,38184,38192,38745,39218,38673,39178,39198,39036,38504,36754,39180,37919,38768,38195,36850,38203,38672,38882,38071,39189,36795,36783,38870,38764,39028,36762,36750,38980,36958,37924,38884,37920,38877,36858,38493,36742,37895,36835,37907,36823,38762,38361,37937,38373,37949,36950,39202,38495,38291,36533,39037,36716,38925,37620,38906,37878,37322,38754,36818,39029,39264,38297,38517,36969,38905,36957,36789,36741,37908,38302,38775,39216,36812,38767,36845,36849,39181,39168,38671,39188,38490,36961,39201,36717,38382,38070,37868,38984,36770,38981,38494,36807,38885,36759,36857,38924,39038,38888,38876,36879,37897,36534,36764,37931,38254,39030,38990,37909,38982,38290,36848,37857,37923,38249,38658,38383,36813,36765,36817,37263,36769,37869,38183,36861,38206,39031,36800,36788,36972,38508,38303,39051,38491,38983,38759,36740,37958,36967,37930,39174,39182,36806,36867,36855,39222,37862,36752,38242,37965,38894,38182,37922,37918,36814,36872,38886,36860,36527,38194,38975,36718,39224,37436,39032";

    echo(strlen($test_str)); echo("<br>");

    echo(strlen(base64_encode(gzcompress($test_str,9)))); echo("<br>");

    echo(strlen(bin2hex(gzcompress($test_str, 9)))); echo("<br>");

    echo(strlen(urlencode(gzcompress($test_str, 9)))); echo("<br>");

    echo(strlen(rtrim(strtr(base64_encode(gzdeflate($test_str, 9)), '+/', '-_'), '=')));
?>

Here are the results:

1799  (original length string)
928   (51.58% compression)
1388
1712
918   (51.028% compression)

Results are comparable for base64_encode with gzcompress AND base64_encode with gzdeflate (and some string transalations). gzdeflate seems to give slightly better efficiency

网友答案:

For long/very long string values, you would like to use POST method instead of GET !

for a good encoding you might wanna try urlencode()/urldecode()

or htmlentities()/html_entity_decode()

Also be carefull that '%2F' is translated to the browser as the '/' char (directory separator). If you use only urlencode you might wanna do a replace on it.

i don't recommend gzcompress on GET parameters.

网友答案:

These functions will compress and decompress a string or an array.

Sometimes you might want to GET an array.

function _encode_string_array ($stringArray) {
    $s = strtr(base64_encode(addslashes(gzcompress(serialize($stringArray),9))), '+/=', '-_,');
    return $s;
}

function _decode_string_array ($stringArray) {
    $s = unserialize(gzuncompress(stripslashes(base64_decode(strtr($stringArray, '-_,', '+/=')))));
    return $s;
}
网友答案:

base64_encode makes the string unreadable (while of course easily decodable) but blows the volume up by 33%.

urlencode() turns any characters unsuitable for URLs into their URL-encoded counterparts. If your aim is to make the string work in the URL, this may be the right way for you.

If you have a session running, you could also consider putting the query string into a session variable with a random (small) number, and put that random number into the GET string. This method won't survive longer than the current session, of course.

Note that a GET string should never exceed 1-2 kilobytes in size due to server and browser limitations.

相关阅读:
Top