Here's something Larry suggested: if a U is the first active format during a pack, (for example, pack "U3C8", @stuff) then the resulting string should be treated as UTF-8 encoded.

Larry が提案したことは、U が pack 中で最初にアクティブなフォーマットである場合(たとえば、pack "U3C8", @stuff)、結果の文字列は UTF-8 エンコードとして扱われるべきであるということです。

If you are working with a git clone of the Perl repository, you will want to create a branch for your changes. This will make creating a proper patch much simpler. See the perlgit for details on how to do this.

Perl リポジトリの git クローン上で作業しているなら、あなたの変更のためのブランチを作成した方が良いでしょう。これにより適切なパッチの作成がを大幅に簡単になります。この方法に関する詳細については perlgit を参照してください。

パッチを書く¶

How do we prepare to fix this up? First we locate the code in question - the pack happens at runtime, so it's going to be in one of the pp files. Sure enough, pp_pack is in pp.c. Since we're going to be altering this file, let's copy it to pp.c~.

この問題を解決するための準備はどうすればいいでしょう? まず問題のコードを見つけます - pack は実行時に発生するので、 pp ファイルの一つにあるはずです。 pp_pack は pp.c にあります。このファイルを変更するので、pp.c~ にコピーします。

[Well, it was in pp.c when this tutorial was written. It has now been split off with pp_unpack to its own file, pp_pack.c]

[そうですね、このチュートリアルが書かれたときは pp.c に書かれていました。現在は pp_unpack とともに独自のファイル pp_pack.c に分割されています]

Now let's look over pp_pack: we take a pattern into pat, and then loop over the pattern, taking each format character in turn into datum_type. Then for each possible format character, we swallow up the other arguments in the pattern (a field width, an asterisk, and so on) and convert the next chunk input into the specified format, adding it onto the output SV cat.

pp_packを見てみましょう: パターンを pat に取り込み、そのパターンをループして、各フォーマット文字を順番に datum_type に取り込みます。次に、可能なフォーマット文字ごとに、パターン内の他の引数 (フィールド幅やアスタリスクなど)を取り込み、次のチャンク入力を指定されたフォーマットに変換し、出力 SV cat に追加します。

How do we know if the U is the first format in the pat? Well, if we have a pointer to the start of pat then, if we see a U we can test whether we're still at the start of the string. So, here's where pat is set up:

U が pat の最初のフォーマットであるかどうかはどうすればわかるでしょう? さて、pat の先頭へのポインタがあれば、U が見つかったら、まだ文字列の先頭にいるかどうかをテストできます。ここで pat が設定されています:

    STRLEN fromlen;
    char *pat = SvPVx(*++MARK, fromlen);
    char *patend = pat + fromlen;
    I32 len;
    I32 datumtype;
    SV *fromstr;

We'll have another string pointer in there:

ここには別の文字列ポインタがあります:

    STRLEN fromlen;
    char *pat = SvPVx(*++MARK, fromlen);
    char *patend = pat + fromlen;
 +  char *patcopy;
    I32 len;
    I32 datumtype;
    SV *fromstr;

And just before we start the loop, we'll set patcopy to be the start of pat:

ループを開始する直前に、patcopy を pat の開始点に設定します。

    items = SP - MARK;
    MARK++;
    SvPVCLEAR(cat);
 +  patcopy = pat;
    while (pat < patend) {

Now if we see a U which was at the start of the string, we turn on the UTF8 flag for the output SV, cat:

文字列の先頭に U がある場合、出力 SV である catに対して UTF8 フラグをオンにします。

 +  if (datumtype == 'U' && pat==patcopy+1)
 +      SvUTF8_on(cat);
    if (datumtype == '#') {
        while (pat < patend && *pat != '\n')
            pat++;

Remember that it has to be patcopy+1 because the first character of the string is the U which has been swallowed into datumtype!

文字列の最初の文字は datumtype に飲み込まれたUなので、これは patcopy+1 でなければならないことを覚えておいてください!

Oops, we forgot one thing: what if there are spaces at the start of the pattern? pack(" U*", @stuff) will have U as the first active character, even though it's not the first thing in the pattern. In this case, we have to advance patcopy along with pat when we see spaces:

おっと、ひとつ忘れていました: パターンの先頭にスペースがあったら? pack(" U*", @stuff) は、パターンの先頭ではないにもかかわらず、最初のアクティブな文字として U を持ちます。この場合、スペースがあるときは、pat とともに patcopy を進める必要があります:

    if (isSPACE(datumtype))
        continue;

needs to become

これは次のようにする必要があります:

    if (isSPACE(datumtype)) {
        patcopy++;
        continue;
    }

OK. That's the C part done. Now we must do two additional things before this patch is ready to go: we've changed the behaviour of Perl, and so we must document that change. We must also provide some more regression tests to make sure our patch works and doesn't create a bug somewhere else along the line.

OK。これで C の部分は完了です。次に、このパッチを準備する前に、次の二つのことを行う必要があります: Perlの動作を変更したので、その変更を文書化する必要があります。また、パッチが動作し、他の場所でバグが発生しないことを確認するために、さらに回帰テストを提供する必要があります。

パッチをテストする¶

The regression tests for each operator live in t/op/, and so we make a copy of t/op/pack.t to t/op/pack.t~. Now we can add our tests to the end. First, we'll test that the U does indeed create Unicode strings.

各演算子の退行テストは t/op/ に存在するため、 t/op/pack.t から t/op/pack.t~ へのコピーを作成します。これで、テストを最後に追加できます。まず、U が実際に Unicode 文字列を作成することをテストします。

t/op/pack.t has a sensible ok() function, but if it didn't we could use the one from t/test.pl.

t/op/pack.t には適切な ok() 関数がありますが、そうでなければ t/test.pl の関数を使うことができます。

 require './test.pl';
 plan( tests => 159 );

so instead of this:

それで次のようにする代わりに:

 print 'not ' unless "1.20.300.4000" eq sprintf "%vd",
                                               pack("U*",1,20,300,4000);
 print "ok $test\n"; $test++;

we can write the more sensible (see Test::More for a full explanation of is() and other testing functions).

より実用的な関数を書くことができます (is() やその他のテスト関数の詳細については Test::More を参照してください)。

 is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
                                       "U* produces Unicode" );

Now we'll test that we got that space-at-the-beginning business right:

では、私たちが最初の場所でビジネスの権利を得たことをテストしてみましょう:

 is( "1.20.300.4000", sprintf "%vd", pack("  U*",1,20,300,4000),
                                     "  with spaces at the beginning" );

And finally we'll test that we don't make Unicode strings if U is not the first active format:

最後に、U が not である場合、Unicode 文字列を作成しないことをテストします:

 isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
                                       "U* not first isn't Unicode" );

Mustn't forget to change the number of tests which appears at the top, or else the automated tester will get confused. This will either look like this:

一番上にあるテストの数を変更することを忘れないでください; さもないと、自動化されたテスターが混乱します。これは以下のようになります:

 print "1..156\n";

or this:

または次のようになります:

 plan( tests => 156 );

We now compile up Perl, and run it through the test suite. Our new tests pass, hooray!

今度は Perl をコンパイルして、テストスイートで実行します。新しいテストに合格しました、万歳!

パッチの文書を書く¶

Finally, the documentation. The job is never done until the paperwork is over, so let's describe the change we've just made. The relevant place is pod/perlfunc.pod; again, we make a copy, and then we'll insert this text in the description of pack:

最後に、文書です。事務処理が終わるまで仕事は終わらないので、今行った変更について説明しましょう。関連する場所は pod/perlfunc.pod です; ここでもコピーを作成し、pack の説明に次のテキストを挿入します:

 =item *

 If the pattern begins with a C<U>, the resulting string will be treated
 as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
 with an initial C<U0>, and the bytes that follow will be interpreted as
 Unicode characters. If you don't want this to happen, you can begin
 your pattern with C<C0> (or anything else) to force Perl not to UTF-8
 encode your string, and then follow this with a C<U*> somewhere in your
 pattern.

投稿¶

See perlhack for details on how to submit this patch.

パッチの投稿方法に関する詳細については perlhack を参照してください。

作者¶

This document was originally written by Nathan Torkington, and is maintained by the perl5-porters mailing list.