User talk:Jleith

From FreeBio

Hi Jason! Thanks for editing! --ASW 17:08, 25 January 2006 (EST)

29-Nov perl snag

I want to assign "a", "t", "g", or "c" randomly to a particular position. I check to make sure that this mutation is indeed a mutation, i.e., that a nucleotide is not being "replaced" by the same nucleotide.

The code is:

        ...
        my $pm;
        my @atgc;
        ...
        @atgc = ("a","t","g","c");
        $pm = $atgc[int rand(4)];
        if ($pm != substr($refgen,$ibp,1)) { 
        ...

where "$refgen" is a string that is the reference genome, "$ibp" is the counter for the base pair we're on, and "$pm" is what $ibp has a (very small) chance of mutating to.

The error message I get is:

Argument "g" isn't numeric in numeric ne (!=) at ./29nov.test.pl line 76, <REF> line 5.

Line 76 is the "if" line, and <REF> is the handle for the reference genome being read from a file.

From the resources on the web I've seen, "!=" should be able to deal with strings as well as numbers. What gives?

Thanks.


I'd suggest trying the equivalent string comparison operator in this case:

  if ($pm ne substr($refgen,$ibp,1)) { 

Check out this link for more info: Perl comparisons.Alternatively, you could just not even make the comparison, and scale your overall probability of a mutation to include the fact that you'll get synonymous mutations 1/4 of the time. --smd 01:46, 28 November 2005 (EST)



Thanks Shawn for your help above. Since I'm not actually creating a copy of the whole genome but rather a list of the deviations from the reference genome, I didn't want to create a list where 1/4 of the entries are synonymous mutations. I've gotten ambitious and am trying to store the files in different directories, so as not to clutter up the working directory. All my directory-manipulation stuff seems to be passed over as if it weren't there, that is, the files get created in the working directory (~/101) rather than in the directories intended to store the copy genomes (~/101/29novgenomecopies/mutratehigh and ~/101/29novgenomecopies/mutratenorm).

The relevant code is:

use Cwd;
my $workingdir = getcwd;
...
for ($r = 1; $r <=2; $r++) {

  if ($r==1){
    chdir "~/101/29novgenomecopies/mutratenorm", 0755 or die "cannot chdir to /22novgenomecopies/mutratenorm: $!";
    $mutrate = $mutratenorm;
    $mutname = $outputformat . 'normmut.';
  } else {
    chdir "~/101/29novgenomecopies/mutratehigh", 0755 or die "cannot chdir to /22novgenomecopies/mutratehigh: $!";
    $mutrate = $mutratehigh;
    $mutname = $outputformat . 'highmut.';
  }
  ...
  chdir "$workingdir" or die "cannot return to working directory: $!";
}

The whole thing is:

#!/usr/bin/perl -w

use strict;

my $mutratenorm = 1e-6; # normal mutation rate (for E. coli, at least).
                    # can change this to a greater value for a higher mutation rate.
my $mutratehigh = 1e-4;
my $source = 'ecoliK-12genome.txt';
# my $source = 'test.txt';
my $refgen;
my $numcopies = 100;
my $outputformat = 'ecoliK-12genomecopy.'; # to have 'n' appended, where n goes
                                           # from 1 to 100.
use Cwd;
my $workingdir = getcwd;

#read in the genome, modify by getting rid of the numbers and whitespaces.

open(REF, $source);
<REF>; #throw out the first line
while(<REF>) {
        $_ = substr($_,10); #start after the numbers
	$refgen = $refgen . $_;
}
$refgen =~ s/\s+//g; #remove all whitespace

my $r;
my $mutrate;
my $mutname;
my $i;
my $filename;
my $ibp;
my @atgc;
my $pm;

for ($r = 1; $r <=2; $r++) {

  if ($r==1){
    chdir "~/101/29novgenomecopies/mutratenorm", 0755 or die "cannot chdir to /22novgenomecopies/mutratenorm: $!";
    $mutrate = $mutratenorm;
    $mutname = $outputformat . 'normmut.';
  } else {
    chdir "~/101/29novgenomecopies/mutratehigh", 0755 or die "cannot chdir to /22novgenomecopies/mutratehigh: $!";
    $mutrate = $mutratehigh;
    $mutname = $outputformat . 'highmut.';
  }

  for ($i = 1; $i <= $numcopies; $i++) {
    $filename = "$mutname" . "$i";
    open(OUT,">$filename");
    for($ibp = 0; $ibp < length($refgen); $ibp++) { # the ref genome is
                                                    # considered to start at 0
      if (int rand (( (4 / 3)*(1 / $mutrate) ) + 1)){ } else {

        @atgc = ("a","t","g","c");
        $pm = $atgc[int rand(4)];
      
        if ($pm ne substr($refgen,$ibp,1)) {
	  print OUT "$ibp $pm\n";
        }
      }
    }
    print OUT "mutation rate = $mutrate";
    close(OUT);
  }
  chdir "$workingdir" or die "cannot return to working directory: $!";
}